From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Magne Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: Problems with xml-parse-string Date: Wed, 22 Sep 2010 18:12:54 +0200 Organization: Programmerer Ingebrigtsen Message-ID: References: <87pqw6d7nz.fsf@stupidchicken.com> <87zkvaiked.fsf@stupidchicken.com> <87vd5ymptn.fsf@stupidchicken.com> <87zkv97u1k.fsf@stupidchicken.com> <87bp7pkcho.fsf@stupidchicken.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1285172003 14470 80.91.229.12 (22 Sep 2010 16:13:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 22 Sep 2010 16:13:23 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 22 18:13:22 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OyRwd-0006f1-V5 for ged-emacs-devel@m.gmane.org; Wed, 22 Sep 2010 18:13:20 +0200 Original-Received: from localhost ([127.0.0.1]:46046 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OyRwd-0000ws-8X for ged-emacs-devel@m.gmane.org; Wed, 22 Sep 2010 12:13:19 -0400 Original-Received: from [140.186.70.92] (port=50353 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OyRwT-0000vA-3F for emacs-devel@gnu.org; Wed, 22 Sep 2010 12:13:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OyRwP-0005JI-6Q for emacs-devel@gnu.org; Wed, 22 Sep 2010 12:13:08 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:42697) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OyRwO-0005J6-RW for emacs-devel@gnu.org; Wed, 22 Sep 2010 12:13:05 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OyRwN-0006ZH-ED for emacs-devel@gnu.org; Wed, 22 Sep 2010 18:13:03 +0200 Original-Received: from cm-84.215.34.171.getinternet.no ([84.215.34.171]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 22 Sep 2010 18:13:03 +0200 Original-Received: from larsi by cm-84.215.34.171.getinternet.no with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 22 Sep 2010 18:13:03 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 106 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cm-84.215.34.171.getinternet.no Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAHlBMVEVLIARmJwoqFAAkEACd diQ0CQExFwE9GwInEwBOHQXb9tWfAAACUElEQVQ4jWWUsW7bMBCG6aKIkU0sDLvZ0ktBqFtrDVE3 KyDsB+gL2KxxymijhtiOlYCCYwx00dv2P1KOneZgyOJ94s/j8ZeUvoii+GJtZSfW2gcliWPK90Xx +QyK/yIQMTORgG8vQEuGjaPXM1qK8QqUIUCNDEDoTg8jKUOZxKrANRL8G6I2CAgAJCVIcAqiEEfK YOBi3iOaOIhSGAOZlD+RAfha0vsbpUZ25pMeK+9/pWf/Wq0tQHMCe1shveuPtprYSo1907AzAGhZ Nr5GD6V3FX5xDgCGN3jcxqZKzBLYRdBre4pq7C9AEkqxEK06rhGVziE7A/DHp+vJCyB78UpKHdkL JW1XwwzvDygrHXumVDVaYZE6tmTn/aLv+yeF8IdxWkNKq/1+lrhcV7KPBLg++Gbobe0nfAac8Tm2 S5aqnNzfTqVvOAo5GK4ikLOjjzianJJxHG3jzgnPuiwPXdmVba7FCGY6AF4/kICQh/tlgHk2EUBr 68RQYrbidwf33l6hu7JGRV0yXUgmNRE43ixpcGP8K4NZA6C0Q0PPUvFKmCKGWzDAJrswvFFiUSjR RmffkUqCJdHaQ+rg8nCVcoMicb0CWLm8SwW1ca2yhLxWvH1HuWTnoeU6kvAHnVQ8gVIXZdqZ7uVe ZqxVvXSUR0BG75dvcI+mYB9buP4ZbLNpPry4KsspxDW61ui3emruElHv8Zam8ufR1dpmoYYn1Af0 7x7OhUkBKtVw+RgCN+rTXdHdO/auKL4e9bTe/VyPlV08Xn4A5tT/eB68+jKc4h8qGFCeoM4P9QAA AABJRU5ErkJggg== Mail-Copies-To: never X-Now-Playing: Zazou, Bikaye & Cy1's _Noir et Blanc_: "M'Pasi Ya M'Pamba (remix)" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:ko+dh/dVEoJfjdD10eHmZD8IfcM= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:130632 Archived-At: Chong Yidong writes: > First let me clarify a technical detail. In your new format, [...] > seems to assume that element names never start with the colon character. > That is, there can never be an element named ":type". > > The XML spec (http://www.w3.org/TR/2008/REC-xml-20081126/) seems to > indicate that element names are allowed to start with a colon; see the > definition of NameStartChar in section 2.3. > > It looks like the new format would give ambiguous results in that case. True. Like I said, I only wanted it for the HTML case, and the XML case was just an afterthought. And in HTML, there can be no :tags. Looking at the output from xml.el and xml.c on two RSS feeds, the format doesn't seem to be the biggest change, but the actual data: This is from the same RSS feed. First the xml.el parser: (pp xml (current-buffer)) ((rdf:RDF ((xmlns:rdf . "http://www.w3.org/1999/02/22-rdf-syntax-ns#") (xmlns . "http://purl.org/rss/1.0/") (xmlns:taxo . "http://purl.org/rss/1.0/modules/taxonomy/") (xmlns:dc . "http://purl.org/dc/elements/1.1/") (xmlns:syn . "http://purl.org/rss/1.0/modules/syndication/") (xmlns:admin . "http://webns.net/mvcb/")) "\n " (channel ((rdf:about . "http://blog.gmane.org/gmane.discuss")) "\n " (title nil "gmane.discuss") "\n " (link nil "http://blog.gmane.org/gmane.discuss") "\n " (description nil ("")) "\n " (syn:updatePeriod nil "hourly") "\n " (syn:updateFrequency nil "1") "\n " (syn:updateBase nil "1901-01-01T00:00+00:00") "\n " (items nil "\n " (rdf:Seq nil "\n " (rdf:li ((rdf:resource . "http://permalink.gmane.org/gmane.discuss/13574")) ("")) "\n " (rdf:li Then the same thing from the xml.c parser: (pp nxml (current-buffer)) (RDF (text . "\n ") (channel (:about . "http://blog.gmane.org/gmane.discuss") (text . "\n ") (title (text . "gmane.discuss")) (text . "\n ") (link (text . "http://blog.gmane.org/gmane.discuss")) (text . "\n ") (description) (text . "\n ") (updatePeriod (text . "hourly")) (text . "\n ") (updateFrequency (text . "1")) (text . "\n ") (updateBase (text . "1901-01-01T00:00+00:00")) (text . "\n ") (items (text . "\n ") (Seq (text . "\n ") (li (:resource . "http://permalink.gmane.org/gmane.discuss/13574")) (text . "\n ") (li So more work is needed to turn the xml.c parser into something that's compatible with what xml.el users expect. Anyway, back to the format thing -- if we disregard the :tag issue (i.e., find a work-around), then it would be pretty trivial to write a function to convert the output from libxml-parse-xml-region into what the xml.el package returns. (Not to mention the nxml.el package, which does the same as the xml.el package?) It'd still be faster than the pure Elisp version, and Gnus can call libxml-parse-html-region (as planned) to render HTML as fast and convenient as possible. -- (domestic pets only, the antidote for overdose, milk.) larsi@gnus.org * Lars Magne Ingebrigtsen