* html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
@ 2010-09-20 15:38 HAMANO Kiyoto
2010-09-21 14:42 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 6+ messages in thread
From: HAMANO Kiyoto @ 2010-09-20 15:38 UTC (permalink / raw)
To: emacs-devel; +Cc: larsi
[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]
;; I reported this in 9/13, But there is no reaction. So, I resends.
Hi, Emacs developers.
The html-parse-string ignores the content of script tag and the
comment tag.
[Reproduce]
Evaluate each following codes.
;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
[Result]
The comment is result.
;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script nil)))
;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) nil))
;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => 34520726 (#o203537226, #x20ebe96)
[Expceted result]
For example, I expect like the following result.
The comment is expected result.
;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script (cdata . "foo"))))
;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) (comment . " comment ")))
;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => (comment . " comment ")
[Patch]
As a sample, I attach the patch which I made.
;; My envrionment:
$ emacs --version
GNU Emacs 24.0.50.2
$ uname -a
Linux debian 2.6.35-trunk-686 #1 SMP Mon Sep 6 17:54:16 UTC 2010 i686 GNU/Linux
$ LANG=c apt-cache policy libxml2
libxml2:
Installed: 2.7.7.dfsg-4
Candidate: 2.7.7.dfsg-4
Version table:
*** 2.7.7.dfsg-4 0
500 http://ftp.jaist.ac.jp sid/main Packages
100 /var/lib/dpkg/status
Thanks.
--
HAMANO Kiyoto
khiker.mail@gmail.com
[-- Attachment #2: xml.c.patch --]
[-- Type: text/x-patch, Size: 1243 bytes --]
=== modified file 'src/xml.c'
--- src/xml.c 2010-09-14 18:37:26 +0000
+++ src/xml.c 2010-09-20 14:51:27 +0000
@@ -57,13 +57,21 @@
child = child->next;
}
return Fnreverse (result);
- } else if (node->type == XML_TEXT_NODE) {
+ } else if (node->type == XML_TEXT_NODE ||
+ node->type == XML_COMMENT_NODE) {
Lisp_Object content = Qnil;
if (node->content)
content = build_string (node->content);
return Fcons (intern (node->name), content);
+ } else if (node->type == XML_CDATA_SECTION_NODE) {
+ Lisp_Object content = Qnil;
+
+ if (node->content)
+ content = build_string (node->content);
+
+ return Fcons (intern ("cdata"), content);
} else
return Qnil;
}
@@ -96,9 +104,19 @@
XML_PARSE_NOERROR);
if (doc != NULL) {
- node = xmlDocGetRootElement (doc);
- if (node != NULL)
- result = make_dom (node);
+ xmlNode* n = doc->children->next;
+ Lisp_Object r = Qnil;
+
+ while (n) {
+ if (r != Qnil) result = Fcons (r, result);
+ r = make_dom (n);
+ n = n->next;
+ }
+
+ if (result == Qnil)
+ result = r;
+ else
+ result = Fnreverse (Fcons (r, result));
xmlFreeDoc (doc);
xmlCleanupParser ();
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
2010-09-20 15:38 html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag HAMANO Kiyoto
@ 2010-09-21 14:42 ` Lars Magne Ingebrigtsen
2010-09-21 16:04 ` Chong Yidong
2011-07-20 15:33 ` Chong Yidong
0 siblings, 2 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-21 14:42 UTC (permalink / raw)
To: HAMANO Kiyoto; +Cc: emacs-devel
HAMANO Kiyoto <khiker.mail@gmail.com> writes:
> [Patch]
> As a sample, I attach the patch which I made.
Thanks; looks reasonable. I think this would could as a substantial
patch, so the FSF needs copyright assignment papers for the code. Do
you already have such papers on file?
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
2010-09-21 14:42 ` Lars Magne Ingebrigtsen
@ 2010-09-21 16:04 ` Chong Yidong
2011-07-20 15:33 ` Chong Yidong
1 sibling, 0 replies; 6+ messages in thread
From: Chong Yidong @ 2010-09-21 16:04 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: HAMANO Kiyoto, emacs-devel
Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> HAMANO Kiyoto <khiker.mail@gmail.com> writes:
>
>> [Patch]
>> As a sample, I attach the patch which I made.
>
> Thanks; looks reasonable. I think this would could as a substantial
> patch, so the FSF needs copyright assignment papers for the code. Do
> you already have such papers on file?
Nope. Let's discuss assignment off-list.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
2010-09-21 14:42 ` Lars Magne Ingebrigtsen
2010-09-21 16:04 ` Chong Yidong
@ 2011-07-20 15:33 ` Chong Yidong
2011-07-20 20:03 ` Lars Magne Ingebrigtsen
2011-07-20 20:54 ` Lars Magne Ingebrigtsen
1 sibling, 2 replies; 6+ messages in thread
From: Chong Yidong @ 2011-07-20 15:33 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: HAMANO Kiyoto, emacs-devel
Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> HAMANO Kiyoto <khiker.mail@gmail.com> writes:
>
>> [Patch]
>> As a sample, I attach the patch which I made.
>
> Thanks; looks reasonable. I think this would could as a substantial
> patch, so the FSF needs copyright assignment papers for the code. Do
> you already have such papers on file?
Hamano Kiyoto's copyright assignment is now complete; if the patch is
good, could you apply it? Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
2011-07-20 15:33 ` Chong Yidong
@ 2011-07-20 20:03 ` Lars Magne Ingebrigtsen
2011-07-20 20:54 ` Lars Magne Ingebrigtsen
1 sibling, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-07-20 20:03 UTC (permalink / raw)
To: Chong Yidong; +Cc: HAMANO Kiyoto, emacs-devel
Chong Yidong <cyd@stupidchicken.com> writes:
> Hamano Kiyoto's copyright assignment is now complete; if the patch is
> good, could you apply it? Thanks.
The patch was from September, and some of the stuff it was doing (the
CDATA stuff) was fixed by somebody in December.
The <-- > stuff wasn't, though, so I adapted that idea and made it
return the comments.
But I'm applying this bit:
> if (doc != NULL) {
> - node = xmlDocGetRootElement (doc);
> - if (node != NULL)
> - result = make_dom (node);
> + xmlNode* n = doc->children->next;
> + Lisp_Object r = Qnil;
> +
> + while (n) {
> + if (r != Qnil) result = Fcons (r, result);
> + r = make_dom (n);
> + n = n->next;
> + }
> +
> + if (result == Qnil)
> + result = r;
> + else
> + result = Fnreverse (Fcons (r, result));
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
2011-07-20 15:33 ` Chong Yidong
2011-07-20 20:03 ` Lars Magne Ingebrigtsen
@ 2011-07-20 20:54 ` Lars Magne Ingebrigtsen
1 sibling, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-07-20 20:54 UTC (permalink / raw)
To: Chong Yidong; +Cc: HAMANO Kiyoto, emacs-devel
Chong Yidong <cyd@stupidchicken.com> writes:
> Hamano Kiyoto's copyright assignment is now complete; if the patch is
> good, could you apply it? Thanks.
If presented with
<!-- comment -->
<p>Text</p>
the patch would return a non-tree structure. I've now fixed that, but
it's not a good fix.
A normal HTML text returns this:
<p>Text</p>
(html nil (body nil (p nil "Text")))
However, this:
<!-- comment --><p>Text</p>
now returns this:
(top nil (comment nil " comment ") (html nil (body nil (p nil "Text"))))
That's obviously not particularly pleasant. Anybody have a better idea?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-07-20 20:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 15:38 html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag HAMANO Kiyoto
2010-09-21 14:42 ` Lars Magne Ingebrigtsen
2010-09-21 16:04 ` Chong Yidong
2011-07-20 15:33 ` Chong Yidong
2011-07-20 20:03 ` Lars Magne Ingebrigtsen
2011-07-20 20:54 ` Lars Magne Ingebrigtsen
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.