all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
@ 2010-09-20 15:38 HAMANO Kiyoto
  2010-09-21 14:42 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: HAMANO Kiyoto @ 2010-09-20 15:38 UTC (permalink / raw)
  To: emacs-devel; +Cc: larsi

[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]

;; I reported this in 9/13, But there is no reaction. So, I resends.

Hi, Emacs developers.

The html-parse-string ignores the content of script tag and the
comment tag.

[Reproduce]
Evaluate each following codes.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))

[Result]
The comment is result.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script nil)))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) nil))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => 34520726 (#o203537226, #x20ebe96)

[Expceted result]
For example, I expect like the following result.
The comment is expected result.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script (cdata . "foo"))))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) (comment . " comment ")))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => (comment . " comment ")

[Patch]
As a sample, I attach the patch which I made.


;; My envrionment:
$ emacs --version
GNU Emacs 24.0.50.2
$ uname -a
Linux debian 2.6.35-trunk-686 #1 SMP Mon Sep 6 17:54:16 UTC 2010 i686 GNU/Linux
$ LANG=c apt-cache policy libxml2
libxml2:
  Installed: 2.7.7.dfsg-4
  Candidate: 2.7.7.dfsg-4
  Version table:
 *** 2.7.7.dfsg-4 0
        500 http://ftp.jaist.ac.jp sid/main Packages
        100 /var/lib/dpkg/status

Thanks.

-- 
HAMANO Kiyoto
khiker.mail@gmail.com

[-- Attachment #2: xml.c.patch --]
[-- Type: text/x-patch, Size: 1243 bytes --]

=== modified file 'src/xml.c'
--- src/xml.c	2010-09-14 18:37:26 +0000
+++ src/xml.c	2010-09-20 14:51:27 +0000
@@ -57,13 +57,21 @@
       child = child->next;
     }
     return Fnreverse (result);
-  } else if (node->type == XML_TEXT_NODE) {
+  } else if (node->type == XML_TEXT_NODE ||
+             node->type == XML_COMMENT_NODE) {
     Lisp_Object content = Qnil;
 
     if (node->content)
       content = build_string (node->content);
 
     return Fcons (intern (node->name), content);
+  } else if (node->type == XML_CDATA_SECTION_NODE) {
+    Lisp_Object content = Qnil;
+
+    if (node->content)
+      content = build_string (node->content);
+
+    return Fcons (intern ("cdata"), content);
   } else
     return Qnil;
 }
@@ -96,9 +104,19 @@
 			 XML_PARSE_NOERROR);
 
   if (doc != NULL) {
-    node = xmlDocGetRootElement (doc);
-    if (node != NULL)
-      result = make_dom (node);
+    xmlNode*    n = doc->children->next;
+    Lisp_Object r = Qnil;
+
+    while (n) {
+      if (r != Qnil) result = Fcons (r, result);
+      r = make_dom (n);
+      n = n->next;
+    }
+
+    if (result == Qnil)
+      result = r;
+    else
+      result = Fnreverse (Fcons (r, result));
 
     xmlFreeDoc (doc);
     xmlCleanupParser ();


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
  2010-09-20 15:38 html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag HAMANO Kiyoto
@ 2010-09-21 14:42 ` Lars Magne Ingebrigtsen
  2010-09-21 16:04   ` Chong Yidong
  2011-07-20 15:33   ` Chong Yidong
  0 siblings, 2 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-21 14:42 UTC (permalink / raw)
  To: HAMANO Kiyoto; +Cc: emacs-devel

HAMANO Kiyoto <khiker.mail@gmail.com> writes:

> [Patch]
> As a sample, I attach the patch which I made.

Thanks; looks reasonable.  I think this would could as a substantial
patch, so the FSF needs copyright assignment papers for the code.  Do
you already have such papers on file?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
  2010-09-21 14:42 ` Lars Magne Ingebrigtsen
@ 2010-09-21 16:04   ` Chong Yidong
  2011-07-20 15:33   ` Chong Yidong
  1 sibling, 0 replies; 6+ messages in thread
From: Chong Yidong @ 2010-09-21 16:04 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: HAMANO Kiyoto, emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> HAMANO Kiyoto <khiker.mail@gmail.com> writes:
>
>> [Patch]
>> As a sample, I attach the patch which I made.
>
> Thanks; looks reasonable.  I think this would could as a substantial
> patch, so the FSF needs copyright assignment papers for the code.  Do
> you already have such papers on file?

Nope.  Let's discuss assignment off-list.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
  2010-09-21 14:42 ` Lars Magne Ingebrigtsen
  2010-09-21 16:04   ` Chong Yidong
@ 2011-07-20 15:33   ` Chong Yidong
  2011-07-20 20:03     ` Lars Magne Ingebrigtsen
  2011-07-20 20:54     ` Lars Magne Ingebrigtsen
  1 sibling, 2 replies; 6+ messages in thread
From: Chong Yidong @ 2011-07-20 15:33 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: HAMANO Kiyoto, emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> HAMANO Kiyoto <khiker.mail@gmail.com> writes:
>
>> [Patch]
>> As a sample, I attach the patch which I made.
>
> Thanks; looks reasonable.  I think this would could as a substantial
> patch, so the FSF needs copyright assignment papers for the code.  Do
> you already have such papers on file?

Hamano Kiyoto's copyright assignment is now complete; if the patch is
good, could you apply it?  Thanks.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
  2011-07-20 15:33   ` Chong Yidong
@ 2011-07-20 20:03     ` Lars Magne Ingebrigtsen
  2011-07-20 20:54     ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-07-20 20:03 UTC (permalink / raw)
  To: Chong Yidong; +Cc: HAMANO Kiyoto, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> Hamano Kiyoto's copyright assignment is now complete; if the patch is
> good, could you apply it?  Thanks.

The patch was from September, and some of the stuff it was doing (the
CDATA stuff) was fixed by somebody in December.

The <-- > stuff wasn't, though, so I adapted that idea and made it
return the comments.

But I'm applying this bit:

>    if (doc != NULL) {
> -    node = xmlDocGetRootElement (doc);
> -    if (node != NULL)
> -      result = make_dom (node);
> +    xmlNode*    n = doc->children->next;
> +    Lisp_Object r = Qnil;
> +
> +    while (n) {
> +      if (r != Qnil) result = Fcons (r, result);
> +      r = make_dom (n);
> +      n = n->next;
> +    }
> +
> +    if (result == Qnil)
> +      result = r;
> +    else
> +      result = Fnreverse (Fcons (r, result));
 
-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
  2011-07-20 15:33   ` Chong Yidong
  2011-07-20 20:03     ` Lars Magne Ingebrigtsen
@ 2011-07-20 20:54     ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-07-20 20:54 UTC (permalink / raw)
  To: Chong Yidong; +Cc: HAMANO Kiyoto, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> Hamano Kiyoto's copyright assignment is now complete; if the patch is
> good, could you apply it?  Thanks.

If presented with

<!-- comment -->
<p>Text</p>

the patch would return a non-tree structure.  I've now fixed that, but
it's not a good fix.

A normal HTML text returns this:

<p>Text</p>

(html nil (body nil (p nil "Text")))

However, this:

<!-- comment --><p>Text</p>

now returns this:

(top nil (comment nil " comment ") (html nil (body nil (p nil "Text"))))

That's obviously not particularly pleasant.  Anybody have a better idea?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-07-20 20:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 15:38 html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag HAMANO Kiyoto
2010-09-21 14:42 ` Lars Magne Ingebrigtsen
2010-09-21 16:04   ` Chong Yidong
2011-07-20 15:33   ` Chong Yidong
2011-07-20 20:03     ` Lars Magne Ingebrigtsen
2011-07-20 20:54     ` Lars Magne Ingebrigtsen

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.