From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Eric M. Ludlam" Newsgroups: gmane.emacs.devel Subject: Re: Linking Emacs with libxml2 Date: Tue, 14 Sep 2010 20:55:23 -0400 Message-ID: <4C90197B.9050808@siege-engine.com> References: <87bp8asms0.fsf@stupidchicken.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1284512142 27068 80.91.229.12 (15 Sep 2010 00:55:42 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 15 Sep 2010 00:55:42 +0000 (UTC) Cc: "Eric M. Ludlam" , emacs-devel@gnu.org To: Chong Yidong Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 15 02:55:40 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OvgHk-000204-1w for ged-emacs-devel@m.gmane.org; Wed, 15 Sep 2010 02:55:40 +0200 Original-Received: from localhost ([127.0.0.1]:51395 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OvgHj-0004qK-8S for ged-emacs-devel@m.gmane.org; Tue, 14 Sep 2010 20:55:39 -0400 Original-Received: from [140.186.70.92] (port=54634 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OvgHb-0004nu-R4 for emacs-devel@gnu.org; Tue, 14 Sep 2010 20:55:34 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OvgHX-0007vI-6n for emacs-devel@gnu.org; Tue, 14 Sep 2010 20:55:31 -0400 Original-Received: from bird.interbax.net ([75.126.100.114]:33464) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OvgHW-0007v1-Vj for emacs-devel@gnu.org; Tue, 14 Sep 2010 20:55:27 -0400 Original-Received: (qmail 32450 invoked from network); 14 Sep 2010 19:55:24 -0500 Original-Received: from static-71-184-83-10.bstnma.fios.verizon.net (HELO ?192.168.1.201?) (71.184.83.10) by interbax.net with SMTP; 14 Sep 2010 19:55:24 -0500 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.3a1pre) Gecko/20091222 Shredder/3.1a1pre In-Reply-To: <87bp8asms0.fsf@stupidchicken.com> X-detected-operating-system: by eggs.gnu.org: Windows 98 (1) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:130163 Archived-At: Hi, Sorry for the late reply. The Semantic parser is a cheap regexp matcher that just looks for titles and lines so things like Speedbar can show a high-level overview of your text. It would be much better to use a real parser if one is available to provide to have that info in one place. Eric On 09/06/2010 03:19 PM, Chong Yidong wrote: > Lars Magne Ingebrigtsen writes: > >> Apparently libxml2 comes with a parser for "real world" HTML, which is >> very intriguing: >> >> http://www.xmlsoft.org/html/libxml-HTMLparser.html >> >> If Emacs provided a native interface to this function, we could say >> >> (parse-html "file.html") >> => (:html (:head ...) (:body ...)) >> >> and get a nice parse tree out very fast. (Parsing HTML from Emacs Lisp >> is rather slow.) > > Semantic already has a HTML parser, but I don't know how usable it is > for the purposes of writing a renderer. >