all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: HAMANO Kiyoto <khiker.mail@gmail.com>
To: emacs-devel@gnu.org
Cc: larsi@gnus.org
Subject: html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag
Date: Tue, 21 Sep 2010 00:38:30 +0900	[thread overview]
Message-ID: <AANLkTi=VOa_6RBf9p2H=w2Hij5baf+znPjMmCz7-AVZL@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]

;; I reported this in 9/13, But there is no reaction. So, I resends.

Hi, Emacs developers.

The html-parse-string ignores the content of script tag and the
comment tag.

[Reproduce]
Evaluate each following codes.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))

[Result]
The comment is result.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script nil)))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) nil))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => 34520726 (#o203537226, #x20ebe96)

[Expceted result]
For example, I expect like the following result.
The comment is expected result.

;; case A.
(insert (format "%S" (html-parse-string "<script>foo</script>")))
;; => (html (head (script (cdata . "foo"))))

;; case B.
(insert (format "%S" (html-parse-string "<p>foo</p><!-- comment -->")))
;; => (html (body (p (text . "foo")) (comment . " comment ")))

;; case C.
(insert (format "%S" (html-parse-string "<!-- comment -->")))
;; => (comment . " comment ")

[Patch]
As a sample, I attach the patch which I made.


;; My envrionment:
$ emacs --version
GNU Emacs 24.0.50.2
$ uname -a
Linux debian 2.6.35-trunk-686 #1 SMP Mon Sep 6 17:54:16 UTC 2010 i686 GNU/Linux
$ LANG=c apt-cache policy libxml2
libxml2:
  Installed: 2.7.7.dfsg-4
  Candidate: 2.7.7.dfsg-4
  Version table:
 *** 2.7.7.dfsg-4 0
        500 http://ftp.jaist.ac.jp sid/main Packages
        100 /var/lib/dpkg/status

Thanks.

-- 
HAMANO Kiyoto
khiker.mail@gmail.com

[-- Attachment #2: xml.c.patch --]
[-- Type: text/x-patch, Size: 1243 bytes --]

=== modified file 'src/xml.c'
--- src/xml.c	2010-09-14 18:37:26 +0000
+++ src/xml.c	2010-09-20 14:51:27 +0000
@@ -57,13 +57,21 @@
       child = child->next;
     }
     return Fnreverse (result);
-  } else if (node->type == XML_TEXT_NODE) {
+  } else if (node->type == XML_TEXT_NODE ||
+             node->type == XML_COMMENT_NODE) {
     Lisp_Object content = Qnil;
 
     if (node->content)
       content = build_string (node->content);
 
     return Fcons (intern (node->name), content);
+  } else if (node->type == XML_CDATA_SECTION_NODE) {
+    Lisp_Object content = Qnil;
+
+    if (node->content)
+      content = build_string (node->content);
+
+    return Fcons (intern ("cdata"), content);
   } else
     return Qnil;
 }
@@ -96,9 +104,19 @@
 			 XML_PARSE_NOERROR);
 
   if (doc != NULL) {
-    node = xmlDocGetRootElement (doc);
-    if (node != NULL)
-      result = make_dom (node);
+    xmlNode*    n = doc->children->next;
+    Lisp_Object r = Qnil;
+
+    while (n) {
+      if (r != Qnil) result = Fcons (r, result);
+      r = make_dom (n);
+      n = n->next;
+    }
+
+    if (result == Qnil)
+      result = r;
+    else
+      result = Fnreverse (Fcons (r, result));
 
     xmlFreeDoc (doc);
     xmlCleanupParser ();


             reply	other threads:[~2010-09-20 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-20 15:38 HAMANO Kiyoto [this message]
2010-09-21 14:42 ` html-parse-string: Ignores the content of SCRIPT tag and the COMMENT tag Lars Magne Ingebrigtsen
2010-09-21 16:04   ` Chong Yidong
2011-07-20 15:33   ` Chong Yidong
2011-07-20 20:03     ` Lars Magne Ingebrigtsen
2011-07-20 20:54     ` Lars Magne Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=VOa_6RBf9p2H=w2Hij5baf+znPjMmCz7-AVZL@mail.gmail.com' \
    --to=khiker.mail@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.