unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* guile-lib <p> handling
@ 2020-03-16 13:40 Michal Herko
  2020-03-16 17:15 ` tomas
  2020-03-16 17:16 ` sirgazil
  0 siblings, 2 replies; 3+ messages in thread
From: Michal Herko @ 2020-03-16 13:40 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Dear maintainor of guile-lib.
I believe the special handling of <p> elements in (htmlprag) module to 
be a bug.
For example:

(use-modules (htmlprag))
(html->shtml "<html><body><div><p>text</p></div></body></html>")
; expected result (*TOP* (html (body (div (p "text")))))
; actual (*TOP* (html (body (div) (p "text"))))

Note that the <p> element is parsed outside the <div> element.
I attach the simple patch to remove the special case for <p> elements.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 405 bytes --]

diff --git a/src/htmlprag.scm b/src/htmlprag.scm
index 3bd352b..df99612 100644
--- a/src/htmlprag.scm
+++ b/src/htmlprag.scm
@@ -1099,7 +1099,6 @@
               (meta     . (head))
               (noframes . (frameset))
               (option   . (select))
-              (p        . (body td th))
               (param    . (applet))
               (tbody    . (table))
               (td       . (tr))

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: guile-lib <p> handling
  2020-03-16 13:40 guile-lib <p> handling Michal Herko
@ 2020-03-16 17:15 ` tomas
  2020-03-16 17:16 ` sirgazil
  1 sibling, 0 replies; 3+ messages in thread
From: tomas @ 2020-03-16 17:15 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On Mon, Mar 16, 2020 at 02:40:45PM +0100, Michal Herko wrote:
> Dear maintainor of guile-lib.
> I believe the special handling of <p> elements in (htmlprag) module
> to be a bug.
> For example:
> 
> (use-modules (htmlprag))
> (html->shtml "<html><body><div><p>text</p></div></body></html>")
> ; expected result (*TOP* (html (body (div (p "text")))))
> ; actual (*TOP* (html (body (div) (p "text"))))
> 
> Note that the <p> element is parsed outside the <div> element.
> I attach the simple patch to remove the special case for <p> elements.
> 

> diff --git a/src/htmlprag.scm b/src/htmlprag.scm
> index 3bd352b..df99612 100644
> --- a/src/htmlprag.scm
> +++ b/src/htmlprag.scm
> @@ -1099,7 +1099,6 @@
>                (meta     . (head))
>                (noframes . (frameset))
>                (option   . (select))
> -              (p        . (body td th))
>                (param    . (applet))
>                (tbody    . (table))
>                (td       . (tr))

Where did you get htmlprag from? I'll guess it's from the
Debian package guile-lib.

It seems the upstream isn't maintained anymore [1]. The
Debian package page [2] lists a maintainer you might want
to contact.

That said, you are modifying the parser's "parent constraints";
I would go the other direction and add <span> to the set of
<p>'s possible parents:

>                (option   . (select))
> -              (p        . (body td th))
> +              (p        . (body td th span))
>                (param    . (applet))

HTML has changed a lot since htmlprag saw its heyday.

Cheers

[1] https://planet.racket-lang.org/package-source/neil/htmlprag.plt/1/7/planet-docs/htmlprag/index.html
[2] https://packages.debian.org/buster/guile-library

-- tomás

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: guile-lib <p> handling
  2020-03-16 13:40 guile-lib <p> handling Michal Herko
  2020-03-16 17:15 ` tomas
@ 2020-03-16 17:16 ` sirgazil
  1 sibling, 0 replies; 3+ messages in thread
From: sirgazil @ 2020-03-16 17:16 UTC (permalink / raw)
  To: Michal Herko; +Cc: guile-user

 ---- On Mon, 16 Mar 2020 08:40:45 -0500 Michal Herko <michal.herko@disroot.org> wrote ----
 > Dear maintainor of guile-lib.
 > I believe the special handling of <p> elements in (htmlprag) module to 
 > be a bug.
 > For example:
 > 
 > (use-modules (htmlprag))
 > (html->shtml "<html><body><div><p>text</p></div></body></html>")
 > ; expected result (*TOP* (html (body (div (p "text")))))
 > ; actual (*TOP* (html (body (div) (p "text"))))
 > 
 > Note that the <p> element is parsed outside the <div> element.
 > I attach the simple patch to remove the special case for <p> elements.
 > 
 > 

Sounds familiar. I think I actually stopped using htmlprag a few months ago because of this, and I think this is an older thread about the problem: 

https://lists.gnu.org/archive/html/guile-user/2019-09/msg00009.html

I'm currently using SXML whenever possible.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-16 17:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 13:40 guile-lib <p> handling Michal Herko
2020-03-16 17:15 ` tomas
2020-03-16 17:16 ` sirgazil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).