unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Linking Emacs with libxml2
@ 2010-09-06 15:21 Lars Magne Ingebrigtsen
  2010-09-06 15:54 ` Wojciech Meyer
                   ` (4 more replies)
  0 siblings, 5 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-06 15:21 UTC (permalink / raw)
  To: emacs-devel

Apparently libxml2 comes with a parser for "real world" HTML, which is
very intriguing:

http://www.xmlsoft.org/html/libxml-HTMLparser.html

If Emacs provided a native interface to this function, we could say

(parse-html "file.html")
=> (:html (:head ...) (:body ...))

and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
is rather slow.)

Has this been discussed before and rejected?  It seems like an obvious
idea, and would enable both easier extraction of data from HTML files,
as well as writing a (simple) HTML renderer in Emacs Lisp.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
@ 2010-09-06 15:54 ` Wojciech Meyer
  2010-09-06 18:26 ` Chad Brown
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 70+ messages in thread
From: Wojciech Meyer @ 2010-09-06 15:54 UTC (permalink / raw)
  To: emacs-devel

On Mon, Sep 6, 2010 at 4:21 PM, Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
> Apparently libxml2 comes with a parser for "real world" HTML, which is
> very intriguing:
>
> http://www.xmlsoft.org/html/libxml-HTMLparser.html
>
> If Emacs provided a native interface to this function, we could say
>
> (parse-html "file.html")
> => (:html (:head ...) (:body ...))

Moreover, It would be nice to have an S-Xml output that could be
queried with XPath.

Wojciech



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
  2010-09-06 15:54 ` Wojciech Meyer
@ 2010-09-06 18:26 ` Chad Brown
  2010-09-06 21:01   ` Lars Magne Ingebrigtsen
  2010-09-06 18:44 ` Lennart Borgman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 70+ messages in thread
From: Chad Brown @ 2010-09-06 18:26 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel


On Sep 6, 2010, at 8:21 AM, Lars Magne Ingebrigtsen wrote:
>  (Parsing HTML from Emacs Lisp is rather slow.)

Do you have a feel for how much slower the elisp-based parsing is?  Is it causing noticeable delays in Gnus, for example?

> Has this been discussed before and rejected?  It seems like an obvious
> idea, and would enable both easier extraction of data from HTML files,
> as well as writing a (simple) HTML renderer in Emacs Lisp.

The various legal hurdles/barriers impeding dynamic library use in emacs are (mostly?) overcome, and this seems like a fine potential candidate.  I suspect that ideally we'd have lisp calls that would call a library parser if available and an elisp parser if not.  What do you think we'd want (in terms of speed/convenience tradeoff) in addition to parse-html?  I'd guess that we'd like to be able to parse both a named file and a buffer (but perhaps working on buffers would lose us the speed of the library?).

I can't totally commit to building anything, but I'm interested in the dynamic library interface to emacs (interested but not yet knowledgable), and I might be able to put some time into helping out.

*Chad


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
  2010-09-06 15:54 ` Wojciech Meyer
  2010-09-06 18:26 ` Chad Brown
@ 2010-09-06 18:44 ` Lennart Borgman
  2010-09-06 18:56   ` Chad Brown
  2010-09-06 19:19 ` Chong Yidong
  2010-09-06 21:08 ` Stefan Monnier
  4 siblings, 1 reply; 70+ messages in thread
From: Lennart Borgman @ 2010-09-06 18:44 UTC (permalink / raw)
  To: emacs-devel

On Mon, Sep 6, 2010 at 5:21 PM, Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
> Apparently libxml2 comes with a parser for "real world" HTML, which is
> very intriguing:
>
> http://www.xmlsoft.org/html/libxml-HTMLparser.html
>
> If Emacs provided a native interface to this function, we could say
>
> (parse-html "file.html")
> => (:html (:head ...) (:body ...))
>
> and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
> is rather slow.)
>
> Has this been discussed before and rejected?  It seems like an obvious
> idea, and would enable both easier extraction of data from HTML files,
> as well as writing a (simple) HTML renderer in Emacs Lisp.

It was discussed before here:

  http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01147.html

Wasn't there a problem with linking to external libraries at that time?



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 18:44 ` Lennart Borgman
@ 2010-09-06 18:56   ` Chad Brown
  2010-09-06 19:08     ` Chong Yidong
  2010-09-06 19:17     ` joakim
  0 siblings, 2 replies; 70+ messages in thread
From: Chad Brown @ 2010-09-06 18:56 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: emacs-devel


On Sep 6, 2010, at 11:44 AM, Lennart Borgman wrote:

> On Mon, Sep 6, 2010 at 5:21 PM, Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
>> Apparently libxml2 comes with a parser for "real world" HTML, which is
>> very intriguing:
>> 
>> http://www.xmlsoft.org/html/libxml-HTMLparser.html
>> 
>> If Emacs provided a native interface to this function, we could say
>> 
>> (parse-html "file.html")
>> => (:html (:head ...) (:body ...))
>> 
>> and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
>> is rather slow.)
>> 
>> Has this been discussed before and rejected?  It seems like an obvious
>> idea, and would enable both easier extraction of data from HTML files,
>> as well as writing a (simple) HTML renderer in Emacs Lisp.
> 
> It was discussed before here:
> 
>  http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01147.html
> 
> Wasn't there a problem with linking to external libraries at that time?

Yes, there was.  The FSF [lawyers] recently determined that it would be 
possible to use external libraries with some explicit marking of legal status, 
along the lines of what is used in GCC.  Looking back through the mail 
archives, it seems that practical implementation is stuck waiting on an FFI 
design/implementation.  I thought that one had been sketched out, but I'm 
not finding it in the archives, so perhaps I am confused.   

*Chad


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 18:56   ` Chad Brown
@ 2010-09-06 19:08     ` Chong Yidong
  2010-09-06 19:17     ` joakim
  1 sibling, 0 replies; 70+ messages in thread
From: Chong Yidong @ 2010-09-06 19:08 UTC (permalink / raw)
  To: Chad Brown; +Cc: Lennart Borgman, emacs-devel

Chad Brown <yandros@MIT.EDU> writes:

>> Wasn't there a problem with linking to external libraries at that time?
>
> Yes, there was.  The FSF [lawyers] recently determined that it would
> be possible to use external libraries with some explicit marking of
> legal status, along the lines of what is used in GCC.

This refers to only FFI.  Using libxml as a static dependency has never
been a problem, apart from needing someone to implement it.

> Looking back through the mail archives, it seems that practical
> implementation is stuck waiting on an FFI design/implementation.  I
> thought that one had been sketched out, but I'm not finding it in the
> archives, so perhaps I am confused.

It's already been finalized:

http://www.gnu.org/prep/standards/html_node/Dynamic-Plug_002dIn-Interfaces.html



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 18:56   ` Chad Brown
  2010-09-06 19:08     ` Chong Yidong
@ 2010-09-06 19:17     ` joakim
  2010-09-07  0:36       ` Jason Rumney
  1 sibling, 1 reply; 70+ messages in thread
From: joakim @ 2010-09-06 19:17 UTC (permalink / raw)
  To: Chad Brown; +Cc: Lennart Borgman, emacs-devel

Chad Brown <yandros@MIT.EDU> writes:

> On Sep 6, 2010, at 11:44 AM, Lennart Borgman wrote:
>
>> On Mon, Sep 6, 2010 at 5:21 PM, Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
>>> Apparently libxml2 comes with a parser for "real world" HTML, which is
>>> very intriguing:
>>> 
>>> http://www.xmlsoft.org/html/libxml-HTMLparser.html>> 
>>> If Emacs provided a native interface to this function, we could say
>>> 
>>> (parse-html "file.html")
>>> => (:html (:head ...) (:body ...))
>>> 
>>> and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
>>> is rather slow.)
>>> 
>>> Has this been discussed before and rejected?  It seems like an obvious
>>> idea, and would enable both easier extraction of data from HTML files,
>>> as well as writing a (simple) HTML renderer in Emacs Lisp.
>> 
>> It was discussed before here:
>> 
>>  http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01147.html> 
>> Wasn't there a problem with linking to external libraries at that time?
>
> Yes, there was.  The FSF [lawyers] recently determined that it would be 
> possible to use external libraries with some explicit marking of legal status, 
> along the lines of what is used in GCC.  Looking back through the mail 
> archives, it seems that practical implementation is stuck waiting on an FFI 
> design/implementation.  I thought that one had been sketched out, but I'm 
> not finding it in the archives, so perhaps I am confused.   
>

We dont need a generic FFI to link with libxml2, so thats not hindering
adoption. What we would need is an interface specification for xml
parsing in Emacs, and a specific implementation for libxml2.

The problem is rather that adding more dependencies complicates
maintenance of Emacs, and that maintaining a process buffer style
interface is perceived as simpler. Given that sentiment, strong
arguments needs to be presented in favour of in-process handling.

FWIW my personal opinion is that linking with well established external
libraries is no big deal.

> *Chad

-- 
Joakim Verona



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
                   ` (2 preceding siblings ...)
  2010-09-06 18:44 ` Lennart Borgman
@ 2010-09-06 19:19 ` Chong Yidong
  2010-09-06 21:03   ` Lars Magne Ingebrigtsen
  2010-09-15  0:55   ` Eric M. Ludlam
  2010-09-06 21:08 ` Stefan Monnier
  4 siblings, 2 replies; 70+ messages in thread
From: Chong Yidong @ 2010-09-06 19:19 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eric M. Ludlam

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Apparently libxml2 comes with a parser for "real world" HTML, which is
> very intriguing:
>
> http://www.xmlsoft.org/html/libxml-HTMLparser.html
>
> If Emacs provided a native interface to this function, we could say
>
> (parse-html "file.html")
> => (:html (:head ...) (:body ...))
>
> and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
> is rather slow.)

Semantic already has a HTML parser, but I don't know how usable it is
for the purposes of writing a renderer.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 18:26 ` Chad Brown
@ 2010-09-06 21:01   ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-06 21:01 UTC (permalink / raw)
  To: emacs-devel

Chad Brown <yandros@MIT.EDU> writes:

> Do you have a feel for how much slower the elisp-based parsing is?  Is
> it causing noticeable delays in Gnus, for example?

Gnus doesn't really parse HTML, but the experience with w3 shows that
it's quite slow.

> I can't totally commit to building anything, but I'm interested in the
> dynamic library interface to emacs (interested but not yet
> knowledgable), and I might be able to put some time into helping out.

This wouldn't be a dynamic library interface thing, but a statically
compiled interface.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 19:19 ` Chong Yidong
@ 2010-09-06 21:03   ` Lars Magne Ingebrigtsen
  2010-09-15  0:55   ` Eric M. Ludlam
  1 sibling, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-06 21:03 UTC (permalink / raw)
  To: emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> Semantic already has a HTML parser, but I don't know how usable it is
> for the purposes of writing a renderer.

I'm not very familiar with Semantic or its parser, but from doing a
quick Google, it doesn't seem to return a complete HTML parse tree,
which is what you'd need to create a renderer, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
                   ` (3 preceding siblings ...)
  2010-09-06 19:19 ` Chong Yidong
@ 2010-09-06 21:08 ` Stefan Monnier
  2010-09-06 21:17   ` Lars Magne Ingebrigtsen
  2010-09-06 21:18   ` Lennart Borgman
  4 siblings, 2 replies; 70+ messages in thread
From: Stefan Monnier @ 2010-09-06 21:08 UTC (permalink / raw)
  To: emacs-devel

> Apparently libxml2 comes with a parser for "real world" HTML, which is
> very intriguing:
[...]
> Has this been discussed before and rejected?  It seems like an obvious
> idea, and would enable both easier extraction of data from HTML files,
> as well as writing a (simple) HTML renderer in Emacs Lisp.

It's an obvious idea, but I think it's a fair bit of work:
- you'll probably want your function to be able to read from a buffer
  rather than from a file (reading from a file would slow down the
  operation to a point where using a separate xml-to-elisp executable
  isn't that much worse).
- parsing HTML is the easy part, rendering it in Emacs is a lot
  more difficult.


        Stefan



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 21:08 ` Stefan Monnier
@ 2010-09-06 21:17   ` Lars Magne Ingebrigtsen
  2010-09-06 21:30     ` joakim
  2010-09-07  1:40     ` Chad Brown
  2010-09-06 21:18   ` Lennart Borgman
  1 sibling, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-06 21:17 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> It's an obvious idea, but I think it's a fair bit of work:
> - you'll probably want your function to be able to read from a buffer
>   rather than from a file (reading from a file would slow down the
>   operation to a point where using a separate xml-to-elisp executable
>   isn't that much worse).

That's actually the main entry point for the library:

http://www.xmlsoft.org/html/libxml-HTMLparser.html#htmlParseDoc

Well, you have to convert the buffer to a string, but...

> - parsing HTML is the easy part, rendering it in Emacs is a lot
>   more difficult.

Well, parsing real work HTML is quite tricky, but you're right in that
the major part of this work wouldn't be hooking libxml2 into Emacs
(probably a day's work for somebody who knows what they're doing, and
three days for me?), but writing an HTML renderer.  I've been looking to
see whether there are any C libraries for rendering HTML, but I haven't
found anything.  (Well, except Gecko and Webkit, but 1) we probably
don't want to make Emacs dependent on those very large libraries, and 2)
they're oriented towards more graphical environments than Emacs.)

But I'm kinda unsure how much work writing an HTML renderer would be, if
you had access to a sensible parse tree.  My guess would be that you
could have something that rendered 80% of pages very nicely with one
week's worth of work.  And I take those numbers out of the air, but
that's the vague feeling I have...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 21:08 ` Stefan Monnier
  2010-09-06 21:17   ` Lars Magne Ingebrigtsen
@ 2010-09-06 21:18   ` Lennart Borgman
  1 sibling, 0 replies; 70+ messages in thread
From: Lennart Borgman @ 2010-09-06 21:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Mon, Sep 6, 2010 at 11:08 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>> Apparently libxml2 comes with a parser for "real world" HTML, which is
>> very intriguing:
> [...]
>> Has this been discussed before and rejected?  It seems like an obvious
>> idea, and would enable both easier extraction of data from HTML files,
>> as well as writing a (simple) HTML renderer in Emacs Lisp.
>
> It's an obvious idea, but I think it's a fair bit of work:
> - you'll probably want your function to be able to read from a buffer
>  rather than from a file (reading from a file would slow down the
>  operation to a point where using a separate xml-to-elisp executable
>  isn't that much worse).
> - parsing HTML is the easy part, rendering it in Emacs is a lot
>  more difficult.


But perhaps can libxml2 be used by semantic?

Though I do not know if that is interesting.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 21:17   ` Lars Magne Ingebrigtsen
@ 2010-09-06 21:30     ` joakim
  2010-09-07  1:40     ` Chad Brown
  1 sibling, 0 replies; 70+ messages in thread
From: joakim @ 2010-09-06 21:30 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> It's an obvious idea, but I think it's a fair bit of work:
>> - you'll probably want your function to be able to read from a buffer
>>   rather than from a file (reading from a file would slow down the
>>   operation to a point where using a separate xml-to-elisp executable
>>   isn't that much worse).
>
> That's actually the main entry point for the library:
>
> http://www.xmlsoft.org/html/libxml-HTMLparser.html#htmlParseDoc
>
> Well, you have to convert the buffer to a string, but...
>
>> - parsing HTML is the easy part, rendering it in Emacs is a lot
>>   more difficult.
>
> Well, parsing real work HTML is quite tricky, but you're right in that
> the major part of this work wouldn't be hooking libxml2 into Emacs
> (probably a day's work for somebody who knows what they're doing, and
> three days for me?), but writing an HTML renderer.  I've been looking to
> see whether there are any C libraries for rendering HTML, but I haven't
> found anything.  (Well, except Gecko and Webkit, but 1) we probably
> don't want to make Emacs dependent on those very large libraries, and 2)
> they're oriented towards more graphical environments than Emacs.)

Here I'd like to shamelessly plug my xwidget emacs branch, which allows
for embedding for instance the webkit based uzbl browser inside
Emacs. See the ezbl project for specific use of xwidgets together with
uzbl. Ok, so its only demo-ware code, but its still interesting I think.

> But I'm kinda unsure how much work writing an HTML renderer would be, if
> you had access to a sensible parse tree.  My guess would be that you
> could have something that rendered 80% of pages very nicely with one
> week's worth of work.  And I take those numbers out of the air, but
> that's the vague feeling I have...

-- 
Joakim Verona



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 19:17     ` joakim
@ 2010-09-07  0:36       ` Jason Rumney
  2010-09-07  0:58         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Jason Rumney @ 2010-09-07  0:36 UTC (permalink / raw)
  To: emacs-devel

On 07/09/2010 03:17, joakim@verona.se wrote:
> We dont need a generic FFI to link with libxml2, so thats not hindering
> adoption. What we would need is an interface specification for xml
> parsing in Emacs, and a specific implementation for libxml2.
>
> The problem is rather that adding more dependencies complicates
> maintenance of Emacs, and that maintaining a process buffer style
> interface is perceived as simpler. Given that sentiment, strong
> arguments needs to be presented in favour of in-process handling.
>    

libxml2 is already linked if librsvg is being linked, so it isn't even 
an additional dependency.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-07  0:36       ` Jason Rumney
@ 2010-09-07  0:58         ` Lars Magne Ingebrigtsen
  2010-09-08 14:10           ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-07  0:58 UTC (permalink / raw)
  To: emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> libxml2 is already linked if librsvg is being linked, so it isn't even
> an additional dependency.

So it is.

I'll take a whack at providing an interface to htmlParseDoc(), then, if
nobody else beats me to it...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 21:17   ` Lars Magne Ingebrigtsen
  2010-09-06 21:30     ` joakim
@ 2010-09-07  1:40     ` Chad Brown
  2010-09-07  1:47       ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Chad Brown @ 2010-09-07  1:40 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

I was assuming that people would object to adding a dependency on 
libxml2.  It seems that I was the only one making that assumption, 
especially since it's pulled in for libsrvg already.  My apologies for not 
saying that up-front.

On Sep 6, 2010, at 2:17 PM, Lars Magne Ingebrigtsen wrote:

>> - parsing HTML is the easy part, rendering it in Emacs is a lot
>>  more difficult.
> 
> Well, parsing real work HTML is quite tricky, but you're right in that
> the major part of this work wouldn't be hooking libxml2 into Emacs
> (probably a day's work for somebody who knows what they're doing, and
> three days for me?), but writing an HTML renderer.  I've been looking to
> see whether there are any C libraries for rendering HTML, but I haven't
> found anything.  (Well, except Gecko and Webkit, but 1) we probably
> don't want to make Emacs dependent on those very large libraries, and 2)
> they're oriented towards more graphical environments than Emacs.)
> 
> But I'm kinda unsure how much work writing an HTML renderer would be, if
> you had access to a sensible parse tree.  My guess would be that you
> could have something that rendered 80% of pages very nicely with one
> week's worth of work.  And I take those numbers out of the air, but
> that's the vague feeling I have...

You might want to take a look at w3m (MIT License) or links 2 (GPL) for some
examples of text-based rendering with emacs-like image support.  I don't know 
that either will be preferable to rendering in elisp, but they might at least suggest
where to expect difficulty.  I would personally expect the troubles to pop up around
tables, CSS, and javascript.

	http://sourceforge.net/projects/w3m/
	http://links.twibright.com/

*Chad





^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-07  1:40     ` Chad Brown
@ 2010-09-07  1:47       ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-07  1:47 UTC (permalink / raw)
  To: emacs-devel

Chad Brown <yandros@MIT.EDU> writes:

> I would personally expect the troubles to pop up around tables, CSS,
> and javascript.

I don't plan on supporting CSS or JavaScript, so it's mainly a question
of rendering tables in a sane way.  :-)  The rest should be easy.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-07  0:58         ` Lars Magne Ingebrigtsen
@ 2010-09-08 14:10           ` Lars Magne Ingebrigtsen
  2010-09-08 14:25             ` Andreas Schwab
  2010-09-08 14:40             ` Stefan Monnier
  0 siblings, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 14:10 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> I'll take a whack at providing an interface to htmlParseDoc(), then, if
> nobody else beats me to it...

My main problem is, of course, the most trivial one -- how do I take a
(narrowed) buffer, apply the charset decoding methods, and then up with
a C string that I can feed to the library?  There must be a convenient
utility function somewhere, but I haven't been able to find it.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 14:10           ` Lars Magne Ingebrigtsen
@ 2010-09-08 14:25             ` Andreas Schwab
  2010-09-08 14:40             ` Stefan Monnier
  1 sibling, 0 replies; 70+ messages in thread
From: Andreas Schwab @ 2010-09-08 14:25 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> My main problem is, of course, the most trivial one -- how do I take a
> (narrowed) buffer, apply the charset decoding methods, and then up with

Do you mean encoding?

> a C string that I can feed to the library?  There must be a convenient
> utility function somewhere, but I haven't been able to find it.

Do you really need that C string?  I think it would be more efficient if
you could pass it a function that delivers the characters while encoding
the buffer contents on the fly.

Otherwise, encode-coding-region should do what you need.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 14:10           ` Lars Magne Ingebrigtsen
  2010-09-08 14:25             ` Andreas Schwab
@ 2010-09-08 14:40             ` Stefan Monnier
  2010-09-08 15:16               ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Stefan Monnier @ 2010-09-08 14:40 UTC (permalink / raw)
  To: emacs-devel

>> I'll take a whack at providing an interface to htmlParseDoc(), then, if
>> nobody else beats me to it...
> My main problem is, of course, the most trivial one -- how do I take a
> (narrowed) buffer, apply the charset decoding methods, and then up with
> a C string that I can feed to the library?  There must be a convenient
> utility function somewhere, but I haven't been able to find it.

I don't think there's such a utility function.  But since the internal
encoding of multibyte buffers is a variant of utf-8, you should be able
to feed the internal byte-stream directly without extra decoding
(assuming libxml2 accepts utf-8 input, of course).


        Stefan



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 14:40             ` Stefan Monnier
@ 2010-09-08 15:16               ` Lars Magne Ingebrigtsen
  2010-09-08 16:15                 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 15:16 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I don't think there's such a utility function.  But since the internal
> encoding of multibyte buffers is a variant of utf-8, you should be able
> to feed the internal byte-stream directly without extra decoding
> (assuming libxml2 accepts utf-8 input, of course).

Yes, it does.  So how do I get the internal byte-stream of the buffer?
:-)

Except for that minor detail, it's now implemented:

(html-parse-buffer)
=>
(html
 (head)
 (body
  (:width . "100")
  (div
   (:class . "thing")
   (text
    (text . "Hei")))))

I have to delve into the mysteries of the autoconf a bit to get the -I
line right, though...    

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 15:16               ` Lars Magne Ingebrigtsen
@ 2010-09-08 16:15                 ` Lars Magne Ingebrigtsen
  2010-09-08 18:17                   ` joakim
  2010-09-08 19:10                   ` Andreas Schwab
  0 siblings, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 16:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 24 bytes --]

I did it the hard way:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libxml.diff --]
[-- Type: text/x-diff, Size: 10454 bytes --]

=== modified file 'ChangeLog'
--- ChangeLog	2010-09-04 07:30:14 +0000
+++ ChangeLog	2010-09-08 16:12:36 +0000
@@ -1,3 +1,7 @@
+2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
+
+	* configure.in: Check for libxml2/htmlReadMemory().
+
 2010-09-04  Eli Zaretskii  <eliz@gnu.org>
 
 	* config.bat: Produce lisp/gnus/_dir-locals.el from

=== modified file 'configure'
--- configure	2010-08-23 12:54:09 +0000
+++ configure	2010-09-08 15:55:18 +0000
@@ -660,6 +660,8 @@
 LIBS_MAIL
 liblockfile
 ALLOCA
+LIBXML2_CFLAGS
+LIBXML2_LIBS
 LIBXSM
 LIBGPM
 LIBGIF
@@ -11070,6 +11072,74 @@
 fi
 
 
+### Use libxml2 (-lxml2) if available
+HAVE_LIBXML2=no
+LIBXML2_LIBS=
+if test -n xml2-config; then
+  LIBXML2_CFLAGS="`xml2-config --cflags`"
+  SAVE_CFLAGS="$CFLAGS"
+  CFLAGS="$LIBXML2_CFLAGS $CFLAGS"
+  ac_fn_c_check_header_mongrel "$LINENO" "libxml/xmlexports.h" "ac_cv_header_libxml_xmlexports_h" "$ac_includes_default"
+if test "x$ac_cv_header_libxml_xmlexports_h" = x""yes; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for htmlReadMemory in -lxml2" >&5
+$as_echo_n "checking for htmlReadMemory in -lxml2... " >&6; }
+if test "${ac_cv_lib_xml2_htmlReadMemory+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lxml2 -lxml2 $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char htmlReadMemory ();
+int
+main ()
+{
+return htmlReadMemory ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_xml2_htmlReadMemory=yes
+else
+  ac_cv_lib_xml2_htmlReadMemory=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_xml2_htmlReadMemory" >&5
+$as_echo "$ac_cv_lib_xml2_htmlReadMemory" >&6; }
+if test "x$ac_cv_lib_xml2_htmlReadMemory" = x""yes; then :
+  HAVE_LIBXML2=yes
+fi
+
+fi
+
+
+
+  if test "${HAVE_LIBXML2}" = "yes"; then
+
+$as_echo "#define HAVE_LIBXML2 1" >>confdefs.h
+
+    LIBXML2_LIBS="-lxml2"
+    case "$LIBS" in
+      *-lxml2*) ;;
+      *)      LIBS="$LIBXML2_LIBS $LIBS" ;;
+    esac
+  fi
+  CFLAGS="$SAVE_CFLAGS"
+fi
+
+
+
 # If netdb.h doesn't declare h_errno, we must declare it by hand.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether netdb declares h_errno" >&5
 $as_echo_n "checking whether netdb declares h_errno... " >&6; }

=== modified file 'configure.in'
--- configure.in	2010-08-23 12:54:09 +0000
+++ configure.in	2010-09-08 15:55:38 +0000
@@ -2535,6 +2535,29 @@
 fi
 AC_SUBST(LIBXSM)
 
+### Use libxml2 (-lxml2) if available
+HAVE_LIBXML2=no
+LIBXML2_LIBS=
+if test -n xml2-config; then
+  LIBXML2_CFLAGS="`xml2-config --cflags`"
+  SAVE_CFLAGS="$CFLAGS"
+  CFLAGS="$LIBXML2_CFLAGS $CFLAGS"
+  AC_CHECK_HEADER(libxml/xmlversion.h,
+    [AC_CHECK_LIB(xml2, htmlReadMemory, HAVE_LIBXML2=yes, , -lxml2)])
+
+  if test "${HAVE_LIBXML2}" = "yes"; then
+    AC_DEFINE(HAVE_LIBXML2, 1, [Define to 1 if you have the libxml2 library (-lxml2).])
+    LIBXML2_LIBS="-lxml2"
+    case "$LIBS" in
+      *-lxml2*) ;;
+      *)      LIBS="$LIBXML2_LIBS $LIBS" ;;
+    esac
+  fi
+  CFLAGS="$SAVE_CFLAGS"
+fi
+AC_SUBST(LIBXML2_LIBS)
+AC_SUBST(LIBXML2_CFLAGS)
+
 # If netdb.h doesn't declare h_errno, we must declare it by hand.
 AC_CACHE_CHECK(whether netdb declares h_errno,
 	       emacs_cv_netdb_declares_h_errno,

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2010-09-05 02:06:39 +0000
+++ src/ChangeLog	2010-09-08 16:12:09 +0000
@@ -1,3 +1,9 @@
+2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
+
+	* xml.c: New file.
+	(Fhtml_parse_buffer): New function to interface to the libxml2
+	html parsing function.
+
 2010-09-05  Juanma Barranquero  <lekktu@gmail.com>
 
 	* biditype.h: Regenerate.

=== modified file 'src/Makefile.in'
--- src/Makefile.in	2010-08-17 21:19:11 +0000
+++ src/Makefile.in	2010-09-08 15:52:01 +0000
@@ -226,6 +226,9 @@
 IMAGEMAGICK_LIBS= @IMAGEMAGICK_LIBS@
 IMAGEMAGICK_CFLAGS= @IMAGEMAGICK_CFLAGS@
 
+LIBXML2_LIBS = @LIBXML2_LIBS@
+LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
+
 
 ## widget.o if USE_X_TOOLKIT, otherwise empty.
 WIDGET_OBJ=@WIDGET_OBJ@
@@ -320,7 +323,8 @@
 ## FIXME? MYCPPFLAGS only referenced in etc/DEBUG.
 ALL_CFLAGS=-Demacs -DHAVE_CONFIG_H $(MYCPPFLAGS) -I. -I${srcdir} \
   ${C_SWITCH_MACHINE} ${C_SWITCH_SYSTEM} ${C_SWITCH_X_SITE} \
-  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} ${DBUS_CFLAGS} \
+  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} \
+  ${LIBXML2_CFLAGS} ${DBUS_CFLAGS} \
   ${GCONF_CFLAGS} ${FREETYPE_CFLAGS} ${FONTCONFIG_CFLAGS} \
   ${LIBOTF_CFLAGS} ${M17N_FLT_CFLAGS} ${DEPFLAGS} ${PROFILING_CFLAGS} \
   ${C_WARNINGS_SWITCH} ${CFLAGS}
@@ -349,7 +353,7 @@
 	syntax.o $(UNEXEC_OBJ) bytecode.o \
 	process.o callproc.o \
 	region-cache.o sound.o atimer.o \
-	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o \
+	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o xml.o \
 	$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ)
 
 ## Object files used on some machine or other.
@@ -595,7 +599,8 @@
 ## duplicated symbols.  If the standard libraries were compiled
 ## with GCC, we might need LIB_GCC again after them.
 LIBES = $(LIBS) $(LIBX_BASE) $(LIBX_OTHER) $(LIBSOUND) \
-   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS}  $(DBUS_LIBS) $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
+   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS} $(DBUS_LIBS) \
+   ${LIBXML2_LIBS} $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
    $(LIBS_TERMCAP) $(GETLOADAVG_LIBS) ${GCONF_LIBS} ${LIBSELINUX_LIBS} \
    $(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
    $(LIB_GCC) $(LIB_MATH) $(LIB_STANDARD) $(LIB_GCC)

=== modified file 'src/config.in'
--- src/config.in	2010-08-17 21:19:11 +0000
+++ src/config.in	2010-09-08 15:37:34 +0000
@@ -813,6 +813,9 @@
 /* Define to 1 if you have the SM library (-lSM). */
 #undef HAVE_X_SM
 
+/* Define to 1 if you have the libxml2 library (-lxml2). */
+#undef HAVE_LIBXML2
+
 /* Define to 1 if you want to use the X window system. */
 #undef HAVE_X_WINDOWS
 

=== modified file 'src/emacs.c'
--- src/emacs.c	2010-08-22 21:15:20 +0000
+++ src/emacs.c	2010-09-08 13:39:17 +0000
@@ -1543,6 +1543,7 @@
       syms_of_xselect ();
 #endif
 #endif /* HAVE_X_WINDOWS */
+      syms_of_xml ();
 
       syms_of_menu ();
 

=== modified file 'src/lisp.h'
--- src/lisp.h	2010-08-09 19:25:41 +0000
+++ src/lisp.h	2010-09-08 13:40:50 +0000
@@ -3559,6 +3559,9 @@
 /* Defined in xsmfns.c */
 extern void syms_of_xsmfns (void);
 
+/* Defined in xml.c */
+extern void syms_of_xml (void);
+
 /* Defined in xselect.c */
 EXFUN (Fx_send_client_event, 6);
 extern void syms_of_xselect (void);

=== added file 'src/xml.c'
--- src/xml.c	1970-01-01 00:00:00 +0000
+++ src/xml.c	2010-09-08 16:10:36 +0000
@@ -0,0 +1,131 @@
+/* Interface to libxml2.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#ifdef HAVE_LIBXML2
+
+#include <sys/param.h>
+#include <stdio.h>
+#include <setjmp.h>
+#include <libxml/tree.h>
+#include <libxml/parser.h>
+#include <libxml/HTMLparser.h>
+
+#include "lisp.h"
+#include "systime.h"
+#include "sysselect.h"
+#include "frame.h"
+#include "buffer.h"
+
+Lisp_Object make_dom (xmlNode *node)
+{
+  Lisp_Object result = Qnil;
+  xmlNode *child;
+  xmlAttr *property;
+
+  if (node != NULL) {
+    result = Fcons (Fintern (build_string (node->name),
+			     Vobarray),
+		    Qnil);
+    property = node->properties;
+    while (property != NULL) {
+      if (property->children &&
+	   property->children->content) {
+	char *pname = xmalloc(strlen(property->name) + 2);
+	*pname = ':';
+	strcpy(pname + 1, property->name);
+	result = Fcons (Fcons (Fintern (build_string (pname), Vobarray),
+			       build_string(property->children->content)),
+			result);
+	xfree (pname);
+      }
+      property = property->next;
+    }
+    child = node->children;
+    while (child != NULL) {
+      result = Fcons (make_dom (child), result);
+      child = child->next;
+    }
+    if (node->content)
+      result = Fcons (Fcons (Fintern (build_string ("text"), Vobarray),
+			     build_string(node->content)),
+		      result);
+  }
+  return Fnreverse(result);
+}
+
+DEFUN ("html-parse-buffer", Fhtml_parse_buffer, Shtml_parse_buffer,
+       0, 1, 0,
+       doc: /* Parse the buffer as an HTML document and return the parse tree.*/)
+  (Lisp_Object object)
+{
+  xmlDoc *doc;
+  struct buffer *buffer;
+  xmlNode *node;
+  unsigned char *string, *s;
+  Lisp_Object result;
+  int ibeg, iend;
+
+  LIBXML_TEST_VERSION
+	
+  if (NILP (object))
+    buffer = current_buffer;
+  else {
+    CHECK_BUFFER (object);
+    buffer = XBUFFER (object);
+  }
+
+  ibeg = CHAR_TO_BYTE (XFASTINT (Fpoint_min ()));
+  iend = CHAR_TO_BYTE (XFASTINT (Fpoint_max ()));
+  move_gap_both (XFASTINT (Fpoint_min ()), ibeg);
+  
+  string = (unsigned char *) xmalloc (iend - ibeg + 1);
+  s = string;
+  
+  while (ibeg < iend) {
+    *s++ = *(BYTE_POS_ADDR (ibeg));
+    ibeg++;
+  }
+  *s = 0;
+  
+  doc = htmlReadMemory (string, strlen(string), "", "utf-8", 0);
+
+  if (doc == NULL)
+    return Qnil;
+
+  node = xmlDocGetRootElement (doc);
+  result = make_dom (node);
+  
+  xmlFreeDoc(doc);
+  xmlCleanupParser();
+      
+  return result;
+}
+
+\f
+/***********************************************************************
+			    Initialization
+ ***********************************************************************/
+void
+syms_of_xml (void)
+{
+  defsubr (&Shtml_parse_buffer);
+}
+
+#endif /* HAVE_LIBXML2 */


[-- Attachment #3: Type: text/plain, Size: 759 bytes --]


This compiles and works for me, but I'm not really an Emacs internals
expert.  Ahem.

Or an autoconf one, for that matter.  ./configure finds the stuff it's
looking for, but I get this warning:

-------
[larsi@quimbies ~/src/emacs/trunk]$ ./configure  | grep xml
checking libxml/xmlversion.h usability... yes
checking libxml/xmlversion.h presence... no
configure: WARNING: libxml/xmlversion.h: accepted by the compiler, rejected by the preprocessor!
configure: WARNING: libxml/xmlversion.h: proceeding with the compiler's result
checking for libxml/xmlversion.h... yes
checking for htmlReadMemory in -lxml2... yes
-------

I'm not sure what that means...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 16:15                 ` Lars Magne Ingebrigtsen
@ 2010-09-08 18:17                   ` joakim
  2010-09-08 18:19                     ` Lars Magne Ingebrigtsen
  2010-09-08 19:10                   ` Andreas Schwab
  1 sibling, 1 reply; 70+ messages in thread
From: joakim @ 2010-09-08 18:17 UTC (permalink / raw)
  To: emacs-devel

Maybe make a svannah bzr branch for this then?

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> I did it the hard way:
>
>
> === modified file 'ChangeLog'
> --- ChangeLog	2010-09-04 07:30:14 +0000
> +++ ChangeLog	2010-09-08 16:12:36 +0000
> @@ -1,3 +1,7 @@
> +2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
> +
> +	* configure.in: Check for libxml2/htmlReadMemory().
> +
>  2010-09-04  Eli Zaretskii  <eliz@gnu.org>
>  
>  	* config.bat: Produce lisp/gnus/_dir-locals.el from
>
> === modified file 'configure'
> --- configure	2010-08-23 12:54:09 +0000
> +++ configure	2010-09-08 15:55:18 +0000
> @@ -660,6 +660,8 @@
>  LIBS_MAIL
>  liblockfile
>  ALLOCA
> +LIBXML2_CFLAGS
> +LIBXML2_LIBS
>  LIBXSM
>  LIBGPM
>  LIBGIF
> @@ -11070,6 +11072,74 @@
>  fi
>  
>  
> +### Use libxml2 (-lxml2) if available
> +HAVE_LIBXML2=no
> +LIBXML2_LIBS=
> +if test -n xml2-config; then
> +  LIBXML2_CFLAGS="`xml2-config --cflags`"
> +  SAVE_CFLAGS="$CFLAGS"
> +  CFLAGS="$LIBXML2_CFLAGS $CFLAGS"
> +  ac_fn_c_check_header_mongrel "$LINENO" "libxml/xmlexports.h" "ac_cv_header_libxml_xmlexports_h" "$ac_includes_default"
> +if test "x$ac_cv_header_libxml_xmlexports_h" = x""yes; then :
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for htmlReadMemory in -lxml2" >&5
> +$as_echo_n "checking for htmlReadMemory in -lxml2... " >&6; }
> +if test "${ac_cv_lib_xml2_htmlReadMemory+set}" = set; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-lxml2 -lxml2 $LIBS"
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char htmlReadMemory ();
> +int
> +main ()
> +{
> +return htmlReadMemory ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_link "$LINENO"; then :
> +  ac_cv_lib_xml2_htmlReadMemory=yes
> +else
> +  ac_cv_lib_xml2_htmlReadMemory=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +    conftest$ac_exeext conftest.$ac_ext
> +LIBS=$ac_check_lib_save_LIBS
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_xml2_htmlReadMemory" >&5
> +$as_echo "$ac_cv_lib_xml2_htmlReadMemory" >&6; }
> +if test "x$ac_cv_lib_xml2_htmlReadMemory" = x""yes; then :
> +  HAVE_LIBXML2=yes
> +fi
> +
> +fi
> +
> +
> +
> +  if test "${HAVE_LIBXML2}" = "yes"; then
> +
> +$as_echo "#define HAVE_LIBXML2 1" >>confdefs.h
> +
> +    LIBXML2_LIBS="-lxml2"
> +    case "$LIBS" in
> +      *-lxml2*) ;;
> +      *)      LIBS="$LIBXML2_LIBS $LIBS" ;;
> +    esac
> +  fi
> +  CFLAGS="$SAVE_CFLAGS"
> +fi
> +
> +
> +
>  # If netdb.h doesn't declare h_errno, we must declare it by hand.
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether netdb declares h_errno" >&5
>  $as_echo_n "checking whether netdb declares h_errno... " >&6; }
>
> === modified file 'configure.in'
> --- configure.in	2010-08-23 12:54:09 +0000
> +++ configure.in	2010-09-08 15:55:38 +0000
> @@ -2535,6 +2535,29 @@
>  fi
>  AC_SUBST(LIBXSM)
>  
> +### Use libxml2 (-lxml2) if available
> +HAVE_LIBXML2=no
> +LIBXML2_LIBS=
> +if test -n xml2-config; then
> +  LIBXML2_CFLAGS="`xml2-config --cflags`"
> +  SAVE_CFLAGS="$CFLAGS"
> +  CFLAGS="$LIBXML2_CFLAGS $CFLAGS"
> +  AC_CHECK_HEADER(libxml/xmlversion.h,
> +    [AC_CHECK_LIB(xml2, htmlReadMemory, HAVE_LIBXML2=yes, , -lxml2)])
> +
> +  if test "${HAVE_LIBXML2}" = "yes"; then
> +    AC_DEFINE(HAVE_LIBXML2, 1, [Define to 1 if you have the libxml2 library (-lxml2).])
> +    LIBXML2_LIBS="-lxml2"
> +    case "$LIBS" in
> +      *-lxml2*) ;;
> +      *)      LIBS="$LIBXML2_LIBS $LIBS" ;;
> +    esac
> +  fi
> +  CFLAGS="$SAVE_CFLAGS"
> +fi
> +AC_SUBST(LIBXML2_LIBS)
> +AC_SUBST(LIBXML2_CFLAGS)
> +
>  # If netdb.h doesn't declare h_errno, we must declare it by hand.
>  AC_CACHE_CHECK(whether netdb declares h_errno,
>  	       emacs_cv_netdb_declares_h_errno,
>
> === modified file 'src/ChangeLog'
> --- src/ChangeLog	2010-09-05 02:06:39 +0000
> +++ src/ChangeLog	2010-09-08 16:12:09 +0000
> @@ -1,3 +1,9 @@
> +2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
> +
> +	* xml.c: New file.
> +	(Fhtml_parse_buffer): New function to interface to the libxml2
> +	html parsing function.
> +
>  2010-09-05  Juanma Barranquero  <lekktu@gmail.com>
>  
>  	* biditype.h: Regenerate.
>
> === modified file 'src/Makefile.in'
> --- src/Makefile.in	2010-08-17 21:19:11 +0000
> +++ src/Makefile.in	2010-09-08 15:52:01 +0000
> @@ -226,6 +226,9 @@
>  IMAGEMAGICK_LIBS= @IMAGEMAGICK_LIBS@
>  IMAGEMAGICK_CFLAGS= @IMAGEMAGICK_CFLAGS@
>  
> +LIBXML2_LIBS = @LIBXML2_LIBS@
> +LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
> +
>  
>  ## widget.o if USE_X_TOOLKIT, otherwise empty.
>  WIDGET_OBJ=@WIDGET_OBJ@
> @@ -320,7 +323,8 @@
>  ## FIXME? MYCPPFLAGS only referenced in etc/DEBUG.
>  ALL_CFLAGS=-Demacs -DHAVE_CONFIG_H $(MYCPPFLAGS) -I. -I${srcdir} \
>    ${C_SWITCH_MACHINE} ${C_SWITCH_SYSTEM} ${C_SWITCH_X_SITE} \
> -  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} ${DBUS_CFLAGS} \
> +  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} \
> +  ${LIBXML2_CFLAGS} ${DBUS_CFLAGS} \
>    ${GCONF_CFLAGS} ${FREETYPE_CFLAGS} ${FONTCONFIG_CFLAGS} \
>    ${LIBOTF_CFLAGS} ${M17N_FLT_CFLAGS} ${DEPFLAGS} ${PROFILING_CFLAGS} \
>    ${C_WARNINGS_SWITCH} ${CFLAGS}
> @@ -349,7 +353,7 @@
>  	syntax.o $(UNEXEC_OBJ) bytecode.o \
>  	process.o callproc.o \
>  	region-cache.o sound.o atimer.o \
> -	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o \
> +	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o xml.o \
>  	$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ)
>  
>  ## Object files used on some machine or other.
> @@ -595,7 +599,8 @@
>  ## duplicated symbols.  If the standard libraries were compiled
>  ## with GCC, we might need LIB_GCC again after them.
>  LIBES = $(LIBS) $(LIBX_BASE) $(LIBX_OTHER) $(LIBSOUND) \
> -   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS}  $(DBUS_LIBS) $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
> +   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS} $(DBUS_LIBS) \
> +   ${LIBXML2_LIBS} $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
>     $(LIBS_TERMCAP) $(GETLOADAVG_LIBS) ${GCONF_LIBS} ${LIBSELINUX_LIBS} \
>     $(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
>     $(LIB_GCC) $(LIB_MATH) $(LIB_STANDARD) $(LIB_GCC)
>
> === modified file 'src/config.in'
> --- src/config.in	2010-08-17 21:19:11 +0000
> +++ src/config.in	2010-09-08 15:37:34 +0000
> @@ -813,6 +813,9 @@
>  /* Define to 1 if you have the SM library (-lSM). */
>  #undef HAVE_X_SM
>  
> +/* Define to 1 if you have the libxml2 library (-lxml2). */
> +#undef HAVE_LIBXML2
> +
>  /* Define to 1 if you want to use the X window system. */
>  #undef HAVE_X_WINDOWS
>  
>
> === modified file 'src/emacs.c'
> --- src/emacs.c	2010-08-22 21:15:20 +0000
> +++ src/emacs.c	2010-09-08 13:39:17 +0000
> @@ -1543,6 +1543,7 @@
>        syms_of_xselect ();
>  #endif
>  #endif /* HAVE_X_WINDOWS */
> +      syms_of_xml ();
>  
>        syms_of_menu ();
>  
>
> === modified file 'src/lisp.h'
> --- src/lisp.h	2010-08-09 19:25:41 +0000
> +++ src/lisp.h	2010-09-08 13:40:50 +0000
> @@ -3559,6 +3559,9 @@
>  /* Defined in xsmfns.c */
>  extern void syms_of_xsmfns (void);
>  
> +/* Defined in xml.c */
> +extern void syms_of_xml (void);
> +
>  /* Defined in xselect.c */
>  EXFUN (Fx_send_client_event, 6);
>  extern void syms_of_xselect (void);
>
> === added file 'src/xml.c'
> --- src/xml.c	1970-01-01 00:00:00 +0000
> +++ src/xml.c	2010-09-08 16:10:36 +0000
> @@ -0,0 +1,131 @@
> +/* Interface to libxml2.
> +   Copyright (C) 2010 Free Software Foundation, Inc.
> +
> +This file is part of GNU Emacs.
> +
> +GNU Emacs is free software: you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation, either version 3 of the License, or
> +(at your option) any later version.
> +
> +GNU Emacs is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <config.h>
> +
> +#ifdef HAVE_LIBXML2
> +
> +#include <sys/param.h>
> +#include <stdio.h>
> +#include <setjmp.h>
> +#include <libxml/tree.h>
> +#include <libxml/parser.h>
> +#include <libxml/HTMLparser.h>
> +
> +#include "lisp.h"
> +#include "systime.h"
> +#include "sysselect.h"
> +#include "frame.h"
> +#include "buffer.h"
> +
> +Lisp_Object make_dom (xmlNode *node)
> +{
> +  Lisp_Object result = Qnil;
> +  xmlNode *child;
> +  xmlAttr *property;
> +
> +  if (node != NULL) {
> +    result = Fcons (Fintern (build_string (node->name),
> +			     Vobarray),
> +		    Qnil);
> +    property = node->properties;
> +    while (property != NULL) {
> +      if (property->children &&
> +	   property->children->content) {
> +	char *pname = xmalloc(strlen(property->name) + 2);
> +	*pname = ':';
> +	strcpy(pname + 1, property->name);
> +	result = Fcons (Fcons (Fintern (build_string (pname), Vobarray),
> +			       build_string(property->children->content)),
> +			result);
> +	xfree (pname);
> +      }
> +      property = property->next;
> +    }
> +    child = node->children;
> +    while (child != NULL) {
> +      result = Fcons (make_dom (child), result);
> +      child = child->next;
> +    }
> +    if (node->content)
> +      result = Fcons (Fcons (Fintern (build_string ("text"), Vobarray),
> +			     build_string(node->content)),
> +		      result);
> +  }
> +  return Fnreverse(result);
> +}
> +
> +DEFUN ("html-parse-buffer", Fhtml_parse_buffer, Shtml_parse_buffer,
> +       0, 1, 0,
> +       doc: /* Parse the buffer as an HTML document and return the parse tree.*/)
> +  (Lisp_Object object)
> +{
> +  xmlDoc *doc;
> +  struct buffer *buffer;
> +  xmlNode *node;
> +  unsigned char *string, *s;
> +  Lisp_Object result;
> +  int ibeg, iend;
> +
> +  LIBXML_TEST_VERSION
> +	
> +  if (NILP (object))
> +    buffer = current_buffer;
> +  else {
> +    CHECK_BUFFER (object);
> +    buffer = XBUFFER (object);
> +  }
> +
> +  ibeg = CHAR_TO_BYTE (XFASTINT (Fpoint_min ()));
> +  iend = CHAR_TO_BYTE (XFASTINT (Fpoint_max ()));
> +  move_gap_both (XFASTINT (Fpoint_min ()), ibeg);
> +  
> +  string = (unsigned char *) xmalloc (iend - ibeg + 1);
> +  s = string;
> +  
> +  while (ibeg < iend) {
> +    *s++ = *(BYTE_POS_ADDR (ibeg));
> +    ibeg++;
> +  }
> +  *s = 0;
> +  
> +  doc = htmlReadMemory (string, strlen(string), "", "utf-8", 0);
> +
> +  if (doc == NULL)
> +    return Qnil;
> +
> +  node = xmlDocGetRootElement (doc);
> +  result = make_dom (node);
> +  
> +  xmlFreeDoc(doc);
> +  xmlCleanupParser();
> +      
> +  return result;
> +}
> +
> +\f
> +/***********************************************************************
> +			    Initialization
> + ***********************************************************************/
> +void
> +syms_of_xml (void)
> +{
> +  defsubr (&Shtml_parse_buffer);
> +}
> +
> +#endif /* HAVE_LIBXML2 */
>
>
>
> This compiles and works for me, but I'm not really an Emacs internals
> expert.  Ahem.
>
> Or an autoconf one, for that matter.  ./configure finds the stuff it's
> looking for, but I get this warning:
>
> -------
> [larsi@quimbies ~/src/emacs/trunk]$ ./configure  | grep xml
> checking libxml/xmlversion.h usability... yes
> checking libxml/xmlversion.h presence... no
> configure: WARNING: libxml/xmlversion.h: accepted by the compiler, rejected by the preprocessor!
> configure: WARNING: libxml/xmlversion.h: proceeding with the compiler's result
> checking for libxml/xmlversion.h... yes
> checking for htmlReadMemory in -lxml2... yes
> -------
>
> I'm not sure what that means...

-- 
Joakim Verona



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 18:17                   ` joakim
@ 2010-09-08 18:19                     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 18:19 UTC (permalink / raw)
  To: emacs-devel

joakim@verona.se writes:

> Maybe make a svannah bzr branch for this then?

Is there a point in making this into a branch?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 16:15                 ` Lars Magne Ingebrigtsen
  2010-09-08 18:17                   ` joakim
@ 2010-09-08 19:10                   ` Andreas Schwab
  2010-09-08 20:11                     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Andreas Schwab @ 2010-09-08 19:10 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> +### Use libxml2 (-lxml2) if available
> +HAVE_LIBXML2=no
> +LIBXML2_LIBS=
> +if test -n xml2-config; then

What's the point of this test?

> +  LIBXML2_CFLAGS="`xml2-config --cflags`"
> +  SAVE_CFLAGS="$CFLAGS"
> +  CFLAGS="$LIBXML2_CFLAGS $CFLAGS"
> +  AC_CHECK_HEADER(libxml/xmlversion.h,
> +    [AC_CHECK_LIB(xml2, htmlReadMemory, HAVE_LIBXML2=yes, , -lxml2)])

Please use PKG_CHECK_MODULES.

> +    result = Fcons (Fintern (build_string (node->name),
> +			     Vobarray),

                       intern (node->name)

> +  ibeg = CHAR_TO_BYTE (XFASTINT (Fpoint_min ()));
> +  iend = CHAR_TO_BYTE (XFASTINT (Fpoint_max ()));
> +  move_gap_both (XFASTINT (Fpoint_min ()), ibeg);
> +  while (ibeg < iend) {
> +    *s++ = *(BYTE_POS_ADDR (ibeg));
> +    ibeg++;
> +  }

     Lisp_Object s = make_buffer_string (BEGV, ZV, 0);

> +  doc = htmlReadMemory (string, strlen(string), "", "utf-8", 0);

                           SDATA (s), SBYTES (s)

Note that the internal encoding is utf-8-emacs which is different from
utf-8 in details.

> Or an autoconf one, for that matter.  ./configure finds the stuff it's
> looking for, but I get this warning:
>
> -------
> [larsi@quimbies ~/src/emacs/trunk]$ ./configure  | grep xml
> checking libxml/xmlversion.h usability... yes
> checking libxml/xmlversion.h presence... no
> configure: WARNING: libxml/xmlversion.h: accepted by the compiler, rejected by the preprocessor!
> configure: WARNING: libxml/xmlversion.h: proceeding with the compiler's result
> checking for libxml/xmlversion.h... yes
> checking for htmlReadMemory in -lxml2... yes
> -------
>
> I'm not sure what that means...

Look at config.log.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 19:10                   ` Andreas Schwab
@ 2010-09-08 20:11                     ` Lars Magne Ingebrigtsen
  2010-09-08 20:30                       ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 20:11 UTC (permalink / raw)
  To: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

>> +if test -n xml2-config; then
>
> What's the point of this test?

Wrong test.  I've now changed it to use AC_CHECK_PROGS.

>> +  AC_CHECK_HEADER(libxml/xmlversion.h,
>> +    [AC_CHECK_LIB(xml2, htmlReadMemory, HAVE_LIBXML2=yes, , -lxml2)])
>
> Please use PKG_CHECK_MODULES.

I've tried to understand how PKG_CHECK_MODULES works, and I have no
idea.  What would be the correct incantation?

>> +    result = Fcons (Fintern (build_string (node->name),
>> +			     Vobarray),
>
>                        intern (node->name)
>
>> +  ibeg = CHAR_TO_BYTE (XFASTINT (Fpoint_min ()));
>> +  iend = CHAR_TO_BYTE (XFASTINT (Fpoint_max ()));
>> +  move_gap_both (XFASTINT (Fpoint_min ()), ibeg);
>> +  while (ibeg < iend) {
>> +    *s++ = *(BYTE_POS_ADDR (ibeg));
>> +    ibeg++;
>> +  }
>
>      Lisp_Object s = make_buffer_string (BEGV, ZV, 0);

Thanks; fixed.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 20:11                     ` Lars Magne Ingebrigtsen
@ 2010-09-08 20:30                       ` Lars Magne Ingebrigtsen
  2010-09-08 20:58                         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 20:30 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> I've tried to understand how PKG_CHECK_MODULES works, and I have no
> idea.  What would be the correct incantation?

Never mind.  It's something like:

PKG_CHECK_MODULES(LIBXML2, "libxml-2.0 > 2.0.0", HAVE_LIBXML2=yes, HAVE_LIBXML2=no)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 20:30                       ` Lars Magne Ingebrigtsen
@ 2010-09-08 20:58                         ` Lars Magne Ingebrigtsen
  2010-09-08 21:51                           ` Andreas Schwab
  2010-09-09  8:35                           ` Christian Faulhammer
  0 siblings, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 20:58 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 22 bytes --]

And here's take two:


[-- Attachment #2: libxml.diff-2 --]
[-- Type: application/octet-stream, Size: 11581 bytes --]

=== modified file 'ChangeLog'
--- ChangeLog	2010-09-04 07:30:14 +0000
+++ ChangeLog	2010-09-08 16:12:36 +0000
@@ -1,3 +1,7 @@
+2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
+
+	* configure.in: Check for libxml2/htmlReadMemory().
+
 2010-09-04  Eli Zaretskii  <eliz@gnu.org>
 
 	* config.bat: Produce lisp/gnus/_dir-locals.el from

=== modified file 'configure'
--- configure	2010-08-23 12:54:09 +0000
+++ configure	2010-09-08 20:50:02 +0000
@@ -660,6 +660,8 @@
 LIBS_MAIL
 liblockfile
 ALLOCA
+LIBXML2_LIBS
+LIBXML2_CFLAGS
 LIBXSM
 LIBGPM
 LIBGIF
@@ -11070,6 +11072,109 @@
 fi
 
 
+### Use libxml (-lxml2) if available
+
+  succeeded=no
+
+  # Extract the first word of "pkg-config", so it can be a program name with args.
+set dummy pkg-config; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if test "${ac_cv_path_PKG_CONFIG+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  case $PKG_CONFIG in
+  [\\/]* | ?:[\\/]*)
+  ac_cv_path_PKG_CONFIG="$PKG_CONFIG" # Let the user override the test with a path.
+  ;;
+  *)
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then
+    ac_cv_path_PKG_CONFIG="$as_dir/$ac_word$ac_exec_ext"
+    $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+  test -z "$ac_cv_path_PKG_CONFIG" && ac_cv_path_PKG_CONFIG="no"
+  ;;
+esac
+fi
+PKG_CONFIG=$ac_cv_path_PKG_CONFIG
+if test -n "$PKG_CONFIG"; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $PKG_CONFIG" >&5
+$as_echo "$PKG_CONFIG" >&6; }
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+
+  if test "$PKG_CONFIG" = "no" ; then
+     HAVE_LIBXML2=no
+  else
+     PKG_CONFIG_MIN_VERSION=0.9.0
+     if $PKG_CONFIG --atleast-pkgconfig-version $PKG_CONFIG_MIN_VERSION; then
+        { $as_echo "$as_me:${as_lineno-$LINENO}: checking for libxml-2.0 > 2.5.0" >&5
+$as_echo_n "checking for libxml-2.0 > 2.5.0... " >&6; }
+
+        if $PKG_CONFIG --exists "libxml-2.0 > 2.5.0" 2>&5; then
+            { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+            succeeded=yes
+
+            { $as_echo "$as_me:${as_lineno-$LINENO}: checking LIBXML2_CFLAGS" >&5
+$as_echo_n "checking LIBXML2_CFLAGS... " >&6; }
+            LIBXML2_CFLAGS=`$PKG_CONFIG --cflags "libxml-2.0 > 2.5.0"|sed -e 's,///*,/,g'`
+            { $as_echo "$as_me:${as_lineno-$LINENO}: result: $LIBXML2_CFLAGS" >&5
+$as_echo "$LIBXML2_CFLAGS" >&6; }
+
+            { $as_echo "$as_me:${as_lineno-$LINENO}: checking LIBXML2_LIBS" >&5
+$as_echo_n "checking LIBXML2_LIBS... " >&6; }
+            LIBXML2_LIBS=`$PKG_CONFIG --libs "libxml-2.0 > 2.5.0"|sed -e 's,///*,/,g'`
+            { $as_echo "$as_me:${as_lineno-$LINENO}: result: $LIBXML2_LIBS" >&5
+$as_echo "$LIBXML2_LIBS" >&6; }
+        else
+            { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+            LIBXML2_CFLAGS=""
+            LIBXML2_LIBS=""
+            ## If we have a custom action on failure, don't print errors, but
+            ## do set a variable so people can do so.
+            LIBXML2_PKG_ERRORS=`$PKG_CONFIG --errors-to-stdout --print-errors "libxml-2.0 > 2.5.0"`
+
+        fi
+
+
+
+     else
+        echo "*** Your version of pkg-config is too old. You need version $PKG_CONFIG_MIN_VERSION or newer."
+        echo "*** See http://www.freedesktop.org/software/pkgconfig"
+     fi
+  fi
+
+  if test $succeeded = yes; then
+     HAVE_LIBXML2=yes
+  else
+     HAVE_LIBXML2=no
+  fi
+
+
+
+if test "${HAVE_LIBXML2}" = "yes"; then
+
+$as_echo "#define HAVE_LIBXML2 1" >>confdefs.h
+
+fi
+
 # If netdb.h doesn't declare h_errno, we must declare it by hand.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether netdb declares h_errno" >&5
 $as_echo_n "checking whether netdb declares h_errno... " >&6; }

=== modified file 'configure.in'
--- configure.in	2010-08-23 12:54:09 +0000
+++ configure.in	2010-09-08 20:50:00 +0000
@@ -2535,6 +2535,14 @@
 fi
 AC_SUBST(LIBXSM)
 
+### Use libxml (-lxml2) if available
+PKG_CHECK_MODULES(LIBXML2, libxml-2.0 > 2.5.0, HAVE_LIBXML2=yes, HAVE_LIBXML2=no)
+AC_SUBST(LIBXML2_LIBS)
+AC_SUBST(LIBXML2_CFLAGS)
+if test "${HAVE_LIBXML2}" = "yes"; then
+  AC_DEFINE(HAVE_LIBXML2, 1, [Define to 1 if you have the libxml library (-lxml2).])
+fi
+
 # If netdb.h doesn't declare h_errno, we must declare it by hand.
 AC_CACHE_CHECK(whether netdb declares h_errno,
 	       emacs_cv_netdb_declares_h_errno,

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2010-09-05 02:06:39 +0000
+++ src/ChangeLog	2010-09-08 16:12:09 +0000
@@ -1,3 +1,9 @@
+2010-09-08  Lars Magne Ingebrigtsen  <larsi@gnus.org>
+
+	* xml.c: New file.
+	(Fhtml_parse_buffer): New function to interface to the libxml2
+	html parsing function.
+
 2010-09-05  Juanma Barranquero  <lekktu@gmail.com>
 
 	* biditype.h: Regenerate.

=== modified file 'src/Makefile.in'
--- src/Makefile.in	2010-08-17 21:19:11 +0000
+++ src/Makefile.in	2010-09-08 15:52:01 +0000
@@ -226,6 +226,9 @@
 IMAGEMAGICK_LIBS= @IMAGEMAGICK_LIBS@
 IMAGEMAGICK_CFLAGS= @IMAGEMAGICK_CFLAGS@
 
+LIBXML2_LIBS = @LIBXML2_LIBS@
+LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
+
 
 ## widget.o if USE_X_TOOLKIT, otherwise empty.
 WIDGET_OBJ=@WIDGET_OBJ@
@@ -320,7 +323,8 @@
 ## FIXME? MYCPPFLAGS only referenced in etc/DEBUG.
 ALL_CFLAGS=-Demacs -DHAVE_CONFIG_H $(MYCPPFLAGS) -I. -I${srcdir} \
   ${C_SWITCH_MACHINE} ${C_SWITCH_SYSTEM} ${C_SWITCH_X_SITE} \
-  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} ${DBUS_CFLAGS} \
+  ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} \
+  ${LIBXML2_CFLAGS} ${DBUS_CFLAGS} \
   ${GCONF_CFLAGS} ${FREETYPE_CFLAGS} ${FONTCONFIG_CFLAGS} \
   ${LIBOTF_CFLAGS} ${M17N_FLT_CFLAGS} ${DEPFLAGS} ${PROFILING_CFLAGS} \
   ${C_WARNINGS_SWITCH} ${CFLAGS}
@@ -349,7 +353,7 @@
 	syntax.o $(UNEXEC_OBJ) bytecode.o \
 	process.o callproc.o \
 	region-cache.o sound.o atimer.o \
-	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o \
+	doprnt.o strftime.o intervals.o textprop.o composite.o md5.o xml.o \
 	$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ)
 
 ## Object files used on some machine or other.
@@ -595,7 +599,8 @@
 ## duplicated symbols.  If the standard libraries were compiled
 ## with GCC, we might need LIB_GCC again after them.
 LIBES = $(LIBS) $(LIBX_BASE) $(LIBX_OTHER) $(LIBSOUND) \
-   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS}  $(DBUS_LIBS) $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
+   $(RSVG_LIBS) ${IMAGEMAGICK_LIBS} $(DBUS_LIBS) \
+   ${LIBXML2_LIBS} $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \
    $(LIBS_TERMCAP) $(GETLOADAVG_LIBS) ${GCONF_LIBS} ${LIBSELINUX_LIBS} \
    $(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
    $(LIB_GCC) $(LIB_MATH) $(LIB_STANDARD) $(LIB_GCC)

=== modified file 'src/config.in'
--- src/config.in	2010-08-17 21:19:11 +0000
+++ src/config.in	2010-09-08 15:37:34 +0000
@@ -813,6 +813,9 @@
 /* Define to 1 if you have the SM library (-lSM). */
 #undef HAVE_X_SM
 
+/* Define to 1 if you have the libxml2 library (-lxml2). */
+#undef HAVE_LIBXML2
+
 /* Define to 1 if you want to use the X window system. */
 #undef HAVE_X_WINDOWS
 

=== modified file 'src/emacs.c'
--- src/emacs.c	2010-08-22 21:15:20 +0000
+++ src/emacs.c	2010-09-08 13:39:17 +0000
@@ -1543,6 +1543,7 @@
       syms_of_xselect ();
 #endif
 #endif /* HAVE_X_WINDOWS */
+      syms_of_xml ();
 
       syms_of_menu ();
 

=== modified file 'src/lisp.h'
--- src/lisp.h	2010-08-09 19:25:41 +0000
+++ src/lisp.h	2010-09-08 13:40:50 +0000
@@ -3559,6 +3559,9 @@
 /* Defined in xsmfns.c */
 extern void syms_of_xsmfns (void);
 
+/* Defined in xml.c */
+extern void syms_of_xml (void);
+
 /* Defined in xselect.c */
 EXFUN (Fx_send_client_event, 6);
 extern void syms_of_xselect (void);

=== added file 'src/xml.c'
--- src/xml.c	1970-01-01 00:00:00 +0000
+++ src/xml.c	2010-09-08 20:54:02 +0000
@@ -0,0 +1,129 @@
+/* Interface to libxml2.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#ifdef HAVE_LIBXML2
+
+#include <sys/param.h>
+#include <stdio.h>
+#include <setjmp.h>
+#include <libxml/tree.h>
+#include <libxml/parser.h>
+#include <libxml/HTMLparser.h>
+
+#include "lisp.h"
+#include "systime.h"
+#include "sysselect.h"
+#include "frame.h"
+#include "buffer.h"
+
+Lisp_Object make_dom (xmlNode *node)
+{
+  Lisp_Object result = Qnil;
+  xmlNode *child;
+  xmlAttr *property;
+
+  if (node != NULL) {
+    result = Fcons (intern(node->name), Qnil);
+    property = node->properties;
+    while (property != NULL) {
+      if (property->children &&
+	   property->children->content) {
+	char *pname = xmalloc(strlen(property->name) + 2);
+	*pname = ':';
+	strcpy(pname + 1, property->name);
+	result = Fcons (Fcons (intern (pname),
+			       build_string(property->children->content)),
+			result);
+	xfree (pname);
+      }
+      property = property->next;
+    }
+    child = node->children;
+    while (child != NULL) {
+      result = Fcons (make_dom (child), result);
+      child = child->next;
+    }
+    if (node->content)
+      result = Fcons (Fcons (intern ("text"), 
+			     build_string(node->content)),
+		      result);
+  }
+  return Fnreverse(result);
+}
+
+DEFUN ("html-parse-buffer", Fhtml_parse_buffer, Shtml_parse_buffer,
+       0, 1, 0,
+       doc: /* Parse the buffer as an HTML document and return the parse tree.*/)
+  (Lisp_Object object)
+{
+  xmlDoc *doc;
+  struct buffer *buffer;
+  xmlNode *node;
+  Lisp_Object result, string;
+  int ibeg, iend;
+  struct buffer *prev = current_buffer;
+
+  LIBXML_TEST_VERSION
+	
+  if (NILP (object))
+    buffer = current_buffer;
+  else {
+    CHECK_BUFFER (object);
+    buffer = XBUFFER (object);
+  }
+
+  record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
+
+  if (buffer != current_buffer)
+    set_buffer_internal (buffer);
+  
+  string = make_buffer_string (BEGV, Z, 0);
+  
+  doc = htmlReadMemory (SDATA (string), SBYTES (string), "", "utf-8", 0);
+
+  if (doc == NULL)
+    return Qnil;
+
+  node = xmlDocGetRootElement (doc);
+  result = make_dom (node);
+  
+  xmlFreeDoc(doc);
+  xmlCleanupParser();
+      
+  if (prev != current_buffer)
+    set_buffer_internal (prev);
+  /* Discard the unwind protect for recovering the current
+     buffer.  */
+  specpdl_ptr--;
+
+  return result;
+}
+
+\f
+/***********************************************************************
+			    Initialization
+ ***********************************************************************/
+void
+syms_of_xml (void)
+{
+  defsubr (&Shtml_parse_buffer);
+}
+
+#endif /* HAVE_LIBXML2 */


[-- Attachment #3: Type: text/plain, Size: 103 bytes --]


-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 20:58                         ` Lars Magne Ingebrigtsen
@ 2010-09-08 21:51                           ` Andreas Schwab
  2010-09-08 21:54                             ` Lars Magne Ingebrigtsen
  2010-09-09 17:00                             ` Stefan Monnier
  2010-09-09  8:35                           ` Christian Faulhammer
  1 sibling, 2 replies; 70+ messages in thread
From: Andreas Schwab @ 2010-09-08 21:51 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> +  string = make_buffer_string (BEGV, Z, 0);

Z is the real buffer end, you want ZV.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 21:51                           ` Andreas Schwab
@ 2010-09-08 21:54                             ` Lars Magne Ingebrigtsen
  2010-09-09 17:00                             ` Stefan Monnier
  1 sibling, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-08 21:54 UTC (permalink / raw)
  To: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> Z is the real buffer end, you want ZV.

Right; thanks.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 20:58                         ` Lars Magne Ingebrigtsen
  2010-09-08 21:51                           ` Andreas Schwab
@ 2010-09-09  8:35                           ` Christian Faulhammer
  2010-09-09 10:33                             ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Christian Faulhammer @ 2010-09-09  8:35 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 356 bytes --]

Hi,

Lars Magne Ingebrigtsen <larsi@gnus.org>:

> And here's take two:

 As far as I can see, the dependency is automagic.  Could it be made
configurable? (Or did I miss something?)

V-Li

-- 
Christian Faulhammer, Gentoo Lisp project
<URL:http://www.gentoo.org/proj/en/lisp/>, #gentoo-lisp on FreeNode

<URL:http://gentoo.faulhammer.org/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09  8:35                           ` Christian Faulhammer
@ 2010-09-09 10:33                             ` Lars Magne Ingebrigtsen
  2010-09-09 11:07                               ` Christian Faulhammer
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-09 10:33 UTC (permalink / raw)
  To: emacs-devel

Christian Faulhammer <fauli@gentoo.org> writes:

>  As far as I can see, the dependency is automagic.  Could it be made
> configurable? (Or did I miss something?)

Well, I could add a configuration switch to not build it with libxml2
support.  Defaulting it to "off" would make no sense, since libsrvg
defaults to "on", and libxml2 is a requirement of libsrvg.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 10:33                             ` Lars Magne Ingebrigtsen
@ 2010-09-09 11:07                               ` Christian Faulhammer
  2010-09-09 11:09                                 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Christian Faulhammer @ 2010-09-09 11:07 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

Hi,

Lars Magne Ingebrigtsen <larsi@gnus.org>:

> Christian Faulhammer <fauli@gentoo.org> writes:
> 
> >  As far as I can see, the dependency is automagic.  Could it be made
> > configurable? (Or did I miss something?)
> 
> Well, I could add a configuration switch to not build it with libxml2
> support.  Defaulting it to "off" would make no sense, since libsrvg
> defaults to "on", and libxml2 is a requirement of libsrvg.

 Would be great as those automatic detections make distributors' lifes
harder.

V-Li

-- 
Christian Faulhammer, Gentoo Lisp project
<URL:http://www.gentoo.org/proj/en/lisp/>, #gentoo-lisp on FreeNode

<URL:http://gentoo.faulhammer.org/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 11:07                               ` Christian Faulhammer
@ 2010-09-09 11:09                                 ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-09 11:09 UTC (permalink / raw)
  To: emacs-devel

Christian Faulhammer <fauli@gentoo.org> writes:

>  Would be great as those automatic detections make distributors' lifes
> harder.

Ok, I've now added a --with-xml2=no thing to configure.in.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-08 21:51                           ` Andreas Schwab
  2010-09-08 21:54                             ` Lars Magne Ingebrigtsen
@ 2010-09-09 17:00                             ` Stefan Monnier
  2010-09-09 21:56                               ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Stefan Monnier @ 2010-09-09 17:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

>> +  string = make_buffer_string (BEGV, Z, 0);
> Z is the real buffer end, you want ZV.

Of course, even better would be passing the buffer text as 2 byte arrays
without copying it into a string.

If you have to construct an intermediate string, then you might as well
expose a parse-html-string function and let the Elisp code do
a buffer-string call, which will provide a lot more flexibility.


        Stefan



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 17:00                             ` Stefan Monnier
@ 2010-09-09 21:56                               ` Lars Magne Ingebrigtsen
  2010-09-09 22:28                                 ` Stefan Monnier
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-09 21:56 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Of course, even better would be passing the buffer text as 2 byte arrays
> without copying it into a string.

How would one do that?  Like in my original code, just copying it over
to a C string?  Or is there a convenience function to do that?

> If you have to construct an intermediate string, then you might as well
> expose a parse-html-string function and let the Elisp code do
> a buffer-string call, which will provide a lot more flexibility.

True.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 21:56                               ` Lars Magne Ingebrigtsen
@ 2010-09-09 22:28                                 ` Stefan Monnier
  2010-09-09 22:37                                   ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Stefan Monnier @ 2010-09-09 22:28 UTC (permalink / raw)
  To: emacs-devel

>> Of course, even better would be passing the buffer text as 2 byte arrays
>> without copying it into a string.
> How would one do that?

You'd have to check the libxml2 API.  On the Emacs side, the buffer is
just made of two char* chunks.

> Like in my original code, just copying it over to a C string?

That would defeat the purpose: if you do the copying, then you might as
well do it into an Elisp string so it can be more flexible.

> Or is there a convenience function to do that?

To do what?


        Stefan



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 22:28                                 ` Stefan Monnier
@ 2010-09-09 22:37                                   ` Lars Magne Ingebrigtsen
  2010-09-10  8:14                                     ` Andreas Schwab
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-09 22:37 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Like in my original code, just copying it over to a C string?
>
> That would defeat the purpose: if you do the copying, then you might as
> well do it into an Elisp string so it can be more flexible.
>
>> Or is there a convenience function to do that?
>
> To do what?

To copy the contents of a buffer over to a C string?

I'm not familiar enough with the Emacs internals to know what's really
going on.  Is SDATA(lisp_string) a C string using (kinda) utf-8
representation these days?

Hm.  I guess so.  I'll change the html-parse-buffer function into a
html-parse-string function instead.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-09 22:37                                   ` Lars Magne Ingebrigtsen
@ 2010-09-10  8:14                                     ` Andreas Schwab
  2010-09-10 10:46                                       ` Stefan Monnier
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Schwab @ 2010-09-10  8:14 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hm.  I guess so.  I'll change the html-parse-buffer function into a
> html-parse-string function instead.

I think it is still useful to provide a html-parse-buffer function that
can be more efficient when the intermediate string copy is not needed.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10  8:14                                     ` Andreas Schwab
@ 2010-09-10 10:46                                       ` Stefan Monnier
  2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
                                                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Stefan Monnier @ 2010-09-10 10:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

>> Hm.  I guess so.  I'll change the html-parse-buffer function into a
>> html-parse-string function instead.
> I think it is still useful to provide a html-parse-buffer function that
> can be more efficient when the intermediate string copy is not needed.

Agreed, but right now it's not clear when/how such an intermediate copy
can be avoided.

BTW, another question regarding libxml2: does it provide functions to do
partial parses (e.g. to know the syntactic context of a particular
buffer position)?


        Stefan



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 10:46                                       ` Stefan Monnier
@ 2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
  2010-09-10 12:37                                           ` Lars Magne Ingebrigtsen
  2010-09-10 11:37                                         ` Andreas Schwab
  2010-09-10 14:12                                         ` Andrew W. Nosenko
  2 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 10:56 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> BTW, another question regarding libxml2: does it provide functions to do
> partial parses (e.g. to know the syntactic context of a particular
> buffer position)?

libxml2 is a whopping huge piece of software.  It has several
partial-parsing interfaces, I think.  It's been a while since I looked
at those bits of libxml2, but I seem to recall at least one SAX-based
parsing thing with callbacks and stuff.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 10:46                                       ` Stefan Monnier
  2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
@ 2010-09-10 11:37                                         ` Andreas Schwab
  2010-09-10 14:12                                         ` Andrew W. Nosenko
  2 siblings, 0 replies; 70+ messages in thread
From: Andreas Schwab @ 2010-09-10 11:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> BTW, another question regarding libxml2: does it provide functions to do
> partial parses (e.g. to know the syntactic context of a particular
> buffer position)?

If it's only about handling the gap then it can always be moved out of
the region to parse.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
@ 2010-09-10 12:37                                           ` Lars Magne Ingebrigtsen
  2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 12:37 UTC (permalink / raw)
  To: emacs-devel

Anyway, I'll be committing the libxml2 stuff this evening, and you can
play around with if, or change it, if you want to.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 10:46                                       ` Stefan Monnier
  2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
  2010-09-10 11:37                                         ` Andreas Schwab
@ 2010-09-10 14:12                                         ` Andrew W. Nosenko
  2 siblings, 0 replies; 70+ messages in thread
From: Andrew W. Nosenko @ 2010-09-10 14:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Andreas Schwab, emacs-devel

On Fri, Sep 10, 2010 at 13:46, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>> Hm.  I guess so.  I'll change the html-parse-buffer function into a
>>> html-parse-string function instead.
>> I think it is still useful to provide a html-parse-buffer function that
>> can be more efficient when the intermediate string copy is not needed.
>
> Agreed, but right now it's not clear when/how such an intermediate copy
> can be avoided.

Try the xml*IO family of functions.  E.g. instead of xmlReadDoc() try
xmlReadIO().  Indeed, the libxml2 doesn't need the whole document in
memory for parse it.  IO callbacks supply as mech as need data when it
requested by underlying parser.

>
> BTW, another question regarding libxml2: does it provide functions to do
> partial parses (e.g. to know the syntactic context of a particular
> buffer position)?
>
>
>        Stefan
>
>



-- 
Andrew W. Nosenko <andrew.w.nosenko@gmail.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 12:37                                           ` Lars Magne Ingebrigtsen
@ 2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
  2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 16:47 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Anyway, I'll be committing the libxml2 stuff this evening, and you can
> play around with if, or change it, if you want to.

I've now done so.  Scary!  I hope I did the right bzr incantations...
that is, I just used `vc-dir' and checked in there hoping that it would
know more about these things than I do.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
@ 2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
  2010-09-10 17:05                                                 ` Ted Zlatanov
  2010-09-10 17:34                                                 ` Glenn Morris
  2010-09-10 21:12                                               ` Chad Brown
  2010-09-13 16:06                                               ` Christian Faulhammer
  2 siblings, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 16:54 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> I've now done so.  Scary! 

No email from the Emacs diff list, though.  Did I just commit to the
local repository or what?  Hm.  

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
@ 2010-09-10 17:05                                                 ` Ted Zlatanov
  2010-09-10 17:14                                                   ` Lars Magne Ingebrigtsen
  2010-09-10 17:34                                                 ` Glenn Morris
  1 sibling, 1 reply; 70+ messages in thread
From: Ted Zlatanov @ 2010-09-10 17:05 UTC (permalink / raw)
  To: emacs-devel

On Fri, 10 Sep 2010 18:54:02 +0200 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: 

LMI> Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
>> I've now done so.  Scary! 

LMI> No email from the Emacs diff list, though.  Did I just commit to the
LMI> local repository or what?  Hm.  

Depends on how you set it up.  If you followed the EmacsWiki guide, you
need to push from your quickfixes to your trunk mirror and then from the
trunk mirror to the origin.

Ted




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 17:05                                                 ` Ted Zlatanov
@ 2010-09-10 17:14                                                   ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 17:14 UTC (permalink / raw)
  To: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> Depends on how you set it up.  If you followed the EmacsWiki guide, you
> need to push from your quickfixes to your trunk mirror and then from the
> trunk mirror to the origin.

I used this guide:

  http://www.emacswiki.org/emacs/BzrForEmacsDevs

I created a "bound" repository, I think, with the "bzr bind" stuff...
Hm.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
  2010-09-10 17:05                                                 ` Ted Zlatanov
@ 2010-09-10 17:34                                                 ` Glenn Morris
  2010-09-10 17:41                                                   ` Glenn Morris
  1 sibling, 1 reply; 70+ messages in thread
From: Glenn Morris @ 2010-09-10 17:34 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen wrote:

> Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
>
>> I've now done so.  Scary! 
>
> No email from the Emacs diff list, though.

It only sends mail outs periodically (hourly?), via cron, not
immediately after a commit. I see your changes:

revno: 101403
committer: Lars Magne Ingebrigtsen <larsi at gnus.org>
branch nick: trunk
timestamp: Fri 2010-09-10 18:44:35 +0200
message:
  Add support for the libxml2 library.
  



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 17:34                                                 ` Glenn Morris
@ 2010-09-10 17:41                                                   ` Glenn Morris
  2010-09-10 17:44                                                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Glenn Morris @ 2010-09-10 17:41 UTC (permalink / raw)
  To: Emacs developers


Glenn Morris wrote (on Fri, 10 Sep 2010 at 13:34 -0400):

> I see your changes:

PS so much for your "whitespace" purge. ;)
xml.c, cough.

PPS An etc/NEWS entry, please?



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 17:41                                                   ` Glenn Morris
@ 2010-09-10 17:44                                                     ` Lars Magne Ingebrigtsen
  2010-09-10 18:39                                                       ` Ted Zlatanov
  2010-09-12 16:56                                                       ` Andreas Schwab
  0 siblings, 2 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 17:44 UTC (permalink / raw)
  To: emacs-devel

Glenn Morris <rgm@gnu.org> writes:

> PS so much for your "whitespace" purge. ;)
> xml.c, cough.

Ouch.  I've gotten so used to `show-trailing-whitespace' that I thought
it was OK.

I'll fix the whitespace and check that in.

> PPS An etc/NEWS entry, please?

Will do.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 17:44                                                     ` Lars Magne Ingebrigtsen
@ 2010-09-10 18:39                                                       ` Ted Zlatanov
  2010-09-12 16:56                                                       ` Andreas Schwab
  1 sibling, 0 replies; 70+ messages in thread
From: Ted Zlatanov @ 2010-09-10 18:39 UTC (permalink / raw)
  To: emacs-devel

On Fri, 10 Sep 2010 19:44:31 +0200 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: 

LMI> Glenn Morris <rgm@gnu.org> writes:
>> PS so much for your "whitespace" purge. ;)
>> xml.c, cough.

LMI> Ouch.  I've gotten so used to `show-trailing-whitespace' that I thought
LMI> it was OK.

Could that be global for Emacs?  I can't imagine why it wouldn't be
desirable, at least for the src/ and lisp/ trees.

Ted




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
  2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
@ 2010-09-10 21:12                                               ` Chad Brown
  2010-09-10 21:40                                                 ` Lars Magne Ingebrigtsen
  2010-09-13 18:37                                                 ` Leo
  2010-09-13 16:06                                               ` Christian Faulhammer
  2 siblings, 2 replies; 70+ messages in thread
From: Chad Brown @ 2010-09-10 21:12 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

I had to tweak my macosx (10.6.4) install a bit to get it to recognize
the OS-included libxml2, but once I did, it seems to build and run fine.
Can you suggest a test case (or, even better, add one in tests/) so I
can see if it's working?

If you don't have an example handy, I'm sure I can put one together.

*Chad



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 21:12                                               ` Chad Brown
@ 2010-09-10 21:40                                                 ` Lars Magne Ingebrigtsen
  2010-09-10 22:45                                                   ` chad
  2010-09-13 18:37                                                 ` Leo
  1 sibling, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 21:40 UTC (permalink / raw)
  To: emacs-devel

Chad Brown <yandros@MIT.EDU> writes:

> I had to tweak my macosx (10.6.4) install a bit to get it to recognize
> the OS-included libxml2, but once I did, it seems to build and run fine.
> Can you suggest a test case (or, even better, add one in tests/) so I
> can see if it's working?

Well, just put any HTML in a string and try it out...

My simple test case while fiddling with the syntax tree layout was the
one in the Emacs Lisp Reference Manual.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 21:40                                                 ` Lars Magne Ingebrigtsen
@ 2010-09-10 22:45                                                   ` chad
  2010-09-10 23:19                                                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: chad @ 2010-09-10 22:45 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel


On Sep 10, 2010, at 2:40 PM, Lars Magne Ingebrigtsen wrote:
> My simple test case while fiddling with the syntax tree layout was the
> one in the Emacs Lisp Reference Manual.

I was temporarily confused by `abbreviated display' and the example 
I (randomly) choose, but it seems to work for me on macosx 10.6.4.

*Chad



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 22:45                                                   ` chad
@ 2010-09-10 23:19                                                     ` Lars Magne Ingebrigtsen
  2010-09-11  7:18                                                       ` Andreas Schwab
  0 siblings, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-10 23:19 UTC (permalink / raw)
  To: emacs-devel

chad <yandros@gmail.com> writes:

> I was temporarily confused by `abbreviated display'

Yeah.  I've tried to disable that for ages, but I've never found the
right combinations of variables.  Setting print-level to nil, and
print-length to nil, and eval-expression-print-length to nil doesn't
give me a complete print of the following:

(html-parse-string "<html><hEad></head><body width=101><div class=thing>Hei<div>Yes")

You get a quite short parse tree back, but it's still abbreviated.  Is
that a bug, or is there yet another variable that abbreviates?

> and the example I (randomly) choose, but it seems to work for me on
> macosx 10.6.4.

Great!

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 23:19                                                     ` Lars Magne Ingebrigtsen
@ 2010-09-11  7:18                                                       ` Andreas Schwab
  2010-09-11 12:48                                                         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 70+ messages in thread
From: Andreas Schwab @ 2010-09-11  7:18 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> chad <yandros@gmail.com> writes:
>
>> I was temporarily confused by `abbreviated display'
>
> Yeah.  I've tried to disable that for ages, but I've never found the
> right combinations of variables.  Setting print-level to nil, and
> print-length to nil, and eval-expression-print-length to nil doesn't
> give me a complete print of the following:
>
> (html-parse-string "<html><hEad></head><body width=101><div class=thing>Hei<div>Yes")
>
> You get a quite short parse tree back, but it's still abbreviated.  Is
> that a bug, or is there yet another variable that abbreviates?

eval-expression-print-length and eval-expression-print-level are
overriding print-length and print-level during eval-expression.
Otherwise they have no effect.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-11  7:18                                                       ` Andreas Schwab
@ 2010-09-11 12:48                                                         ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-11 12:48 UTC (permalink / raw)
  To: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> eval-expression-print-length and eval-expression-print-level are
> overriding print-length and print-level during eval-expression.
> Otherwise they have no effect.

Ah, right.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 17:44                                                     ` Lars Magne Ingebrigtsen
  2010-09-10 18:39                                                       ` Ted Zlatanov
@ 2010-09-12 16:56                                                       ` Andreas Schwab
  2010-09-12 17:05                                                         ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 70+ messages in thread
From: Andreas Schwab @ 2010-09-12 16:56 UTC (permalink / raw)
  To: emacs-devel

The base-url argument of html-parse-string/xml-parse-string is not
documented.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-12 16:56                                                       ` Andreas Schwab
@ 2010-09-12 17:05                                                         ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-12 17:05 UTC (permalink / raw)
  To: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> The base-url argument of html-parse-string/xml-parse-string is not
> documented.

Ok; I'll fix it.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
  2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
  2010-09-10 21:12                                               ` Chad Brown
@ 2010-09-13 16:06                                               ` Christian Faulhammer
  2 siblings, 0 replies; 70+ messages in thread
From: Christian Faulhammer @ 2010-09-13 16:06 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

Hi,

Lars Magne Ingebrigtsen <larsi@gnus.org>:

> Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> 
> > Anyway, I'll be committing the libxml2 stuff this evening, and you
> > can play around with if, or change it, if you want to.
> 
> I've now done so.  Scary!  I hope I did the right bzr incantations...
> that is, I just used `vc-dir' and checked in there hoping that it
> would know more about these things than I do.  :-)

 Gentoo users can now try it with USE=libxml2 on
app-editors/emacs-vcs-24.0.999.

V-Li

-- 
Christian Faulhammer, Gentoo Lisp project
<URL:http://www.gentoo.org/proj/en/lisp/>, #gentoo-lisp on FreeNode

<URL:http://gentoo.faulhammer.org/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-10 21:12                                               ` Chad Brown
  2010-09-10 21:40                                                 ` Lars Magne Ingebrigtsen
@ 2010-09-13 18:37                                                 ` Leo
  2010-09-13 18:49                                                   ` Lars Magne Ingebrigtsen
  2010-09-13 19:16                                                   ` Chad Brown
  1 sibling, 2 replies; 70+ messages in thread
From: Leo @ 2010-09-13 18:37 UTC (permalink / raw)
  To: Chad Brown; +Cc: Lars Magne Ingebrigtsen, emacs-devel

On 2010-09-10 22:12 +0100, Chad Brown wrote:
> I had to tweak my macosx (10.6.4) install a bit to get it to recognize
> the OS-included libxml2, but once I did, it seems to build and run fine.
> Can you suggest a test case (or, even better, add one in tests/) so I
> can see if it's working?
>
> If you don't have an example handy, I'm sure I can put one together.
>
> *Chad

What did you do to get it to build with the system's libxml2? I am also
on 10.6.4. But there is not indication of whether libxml2 can be found
or not when running configure script.

Thanks.

Leo



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-13 18:37                                                 ` Leo
@ 2010-09-13 18:49                                                   ` Lars Magne Ingebrigtsen
  2010-09-13 19:08                                                     ` Leo
  2010-09-13 19:16                                                   ` Chad Brown
  1 sibling, 1 reply; 70+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-09-13 18:49 UTC (permalink / raw)
  To: Leo; +Cc: Chad Brown, emacs-devel

Leo <sdl.web@gmail.com> writes:

> What did you do to get it to build with the system's libxml2? I am also
> on 10.6.4. But there is not indication of whether libxml2 can be found
> or not when running configure script.

./configure | grep xml

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-13 18:49                                                   ` Lars Magne Ingebrigtsen
@ 2010-09-13 19:08                                                     ` Leo
  0 siblings, 0 replies; 70+ messages in thread
From: Leo @ 2010-09-13 19:08 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Chad Brown, emacs-devel

On 2010-09-13 19:49 +0100, Lars Magne Ingebrigtsen wrote:
> Leo <sdl.web@gmail.com> writes:
>
>> What did you do to get it to build with the system's libxml2? I am also
>> on 10.6.4. But there is not indication of whether libxml2 can be found
>> or not when running configure script.
>
> ./configure | grep xml

It was due to my erroneous PKG_CONFIG_PATH setup. Now I am also on the
same problem David Reitter is seeing.

Thanks.
Leo



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-13 18:37                                                 ` Leo
  2010-09-13 18:49                                                   ` Lars Magne Ingebrigtsen
@ 2010-09-13 19:16                                                   ` Chad Brown
  2010-09-13 19:23                                                     ` Chad Brown
  2010-09-13 22:24                                                     ` Leo
  1 sibling, 2 replies; 70+ messages in thread
From: Chad Brown @ 2010-09-13 19:16 UTC (permalink / raw)
  To: Leo; +Cc: Emacs-Devel devel

[-- Attachment #1: Type: text/plain, Size: 481 bytes --]


On Sep 13, 2010, at 11:37 AM, Leo wrote:
> 
> What did you do to get it to build with the system's libxml2? I am also
> on 10.6.4. But there is not indication of whether libxml2 can be found
> or not when running configure script.

set PKG_CONFIG_PATH to include /usr/lib/pkgconfig early.  

I just set mine to /usr/lib/pkgconfig:/usr/local/lib/pkgconfig directly, but 
I'm not using fink or MacPorts or any other automatic package system,
and I don't have Gnome installed.

*Chad

[-- Attachment #2: Type: text/html, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-13 19:16                                                   ` Chad Brown
@ 2010-09-13 19:23                                                     ` Chad Brown
  2010-09-13 22:24                                                     ` Leo
  1 sibling, 0 replies; 70+ messages in thread
From: Chad Brown @ 2010-09-13 19:23 UTC (permalink / raw)
  To: Leo; +Cc: Emacs-Devel devel

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

It looks like there might also be an issue with fink/MacPorts/etc using an 
out-of-date version of pkg-config itself.

*Chad


On Sep 13, 2010, at 12:16 PM, Chad Brown wrote:

> 
> On Sep 13, 2010, at 11:37 AM, Leo wrote:
>> 
>> What did you do to get it to build with the system's libxml2? I am also
>> on 10.6.4. But there is not indication of whether libxml2 can be found
>> or not when running configure script.
> 
> set PKG_CONFIG_PATH to include /usr/lib/pkgconfig early.  
> 
> I just set mine to /usr/lib/pkgconfig:/usr/local/lib/pkgconfig directly, but 
> I'm not using fink or MacPorts or any other automatic package system,
> and I don't have Gnome installed.
> 
> *Chad


[-- Attachment #2: Type: text/html, Size: 1273 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-13 19:16                                                   ` Chad Brown
  2010-09-13 19:23                                                     ` Chad Brown
@ 2010-09-13 22:24                                                     ` Leo
  1 sibling, 0 replies; 70+ messages in thread
From: Leo @ 2010-09-13 22:24 UTC (permalink / raw)
  To: Chad Brown; +Cc: Emacs-Devel devel

On 2010-09-13 20:16 +0100, Chad Brown wrote:
> On Sep 13, 2010, at 11:37 AM, Leo wrote:
>> 
>> What did you do to get it to build with the system's libxml2? I am also
>> on 10.6.4. But there is not indication of whether libxml2 can be found
>> or not when running configure script.
>
> set PKG_CONFIG_PATH to include /usr/lib/pkgconfig early.  
>
> I just set mine to /usr/lib/pkgconfig:/usr/local/lib/pkgconfig directly, but 
> I'm not using fink or MacPorts or any other automatic package system,
> and I don't have Gnome installed.
>
> *Chad

Thanks. Successfully built it.

Leo



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-06 19:19 ` Chong Yidong
  2010-09-06 21:03   ` Lars Magne Ingebrigtsen
@ 2010-09-15  0:55   ` Eric M. Ludlam
  2010-09-15 15:52     ` Ted Zlatanov
  1 sibling, 1 reply; 70+ messages in thread
From: Eric M. Ludlam @ 2010-09-15  0:55 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Eric M. Ludlam, emacs-devel

Hi,

Sorry for the late reply.

The Semantic parser is a cheap regexp matcher that just looks for titles 
and <Hx> lines so things like Speedbar can show a high-level overview of 
your text.

It would be much better to use a real parser if one is available to 
provide to have that info in one place.

Eric

On 09/06/2010 03:19 PM, Chong Yidong wrote:
> Lars Magne Ingebrigtsen<larsi@gnus.org>  writes:
>
>> Apparently libxml2 comes with a parser for "real world" HTML, which is
>> very intriguing:
>>
>> http://www.xmlsoft.org/html/libxml-HTMLparser.html
>>
>> If Emacs provided a native interface to this function, we could say
>>
>> (parse-html "file.html")
>> =>  (:html (:head ...) (:body ...))
>>
>> and get a nice parse tree out very fast.  (Parsing HTML from Emacs Lisp
>> is rather slow.)
>
> Semantic already has a HTML parser, but I don't know how usable it is
> for the purposes of writing a renderer.
>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: Linking Emacs with libxml2
  2010-09-15  0:55   ` Eric M. Ludlam
@ 2010-09-15 15:52     ` Ted Zlatanov
  0 siblings, 0 replies; 70+ messages in thread
From: Ted Zlatanov @ 2010-09-15 15:52 UTC (permalink / raw)
  To: emacs-devel

On Tue, 14 Sep 2010 20:55:23 -0400 "Eric M. Ludlam" <eric@siege-engine.com> wrote: 

EML> The Semantic parser is a cheap regexp matcher that just looks for
EML> titles and <Hx> lines so things like Speedbar can show a high-level
EML> overview of your text.

EML> It would be much better to use a real parser if one is available to
EML> provide to have that info in one place.

Are you suggesting Semantic could use the libxml2 parser?  Your last
sentence is a little hard to parse.

Ted




^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2010-09-15 15:52 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-06 15:21 Linking Emacs with libxml2 Lars Magne Ingebrigtsen
2010-09-06 15:54 ` Wojciech Meyer
2010-09-06 18:26 ` Chad Brown
2010-09-06 21:01   ` Lars Magne Ingebrigtsen
2010-09-06 18:44 ` Lennart Borgman
2010-09-06 18:56   ` Chad Brown
2010-09-06 19:08     ` Chong Yidong
2010-09-06 19:17     ` joakim
2010-09-07  0:36       ` Jason Rumney
2010-09-07  0:58         ` Lars Magne Ingebrigtsen
2010-09-08 14:10           ` Lars Magne Ingebrigtsen
2010-09-08 14:25             ` Andreas Schwab
2010-09-08 14:40             ` Stefan Monnier
2010-09-08 15:16               ` Lars Magne Ingebrigtsen
2010-09-08 16:15                 ` Lars Magne Ingebrigtsen
2010-09-08 18:17                   ` joakim
2010-09-08 18:19                     ` Lars Magne Ingebrigtsen
2010-09-08 19:10                   ` Andreas Schwab
2010-09-08 20:11                     ` Lars Magne Ingebrigtsen
2010-09-08 20:30                       ` Lars Magne Ingebrigtsen
2010-09-08 20:58                         ` Lars Magne Ingebrigtsen
2010-09-08 21:51                           ` Andreas Schwab
2010-09-08 21:54                             ` Lars Magne Ingebrigtsen
2010-09-09 17:00                             ` Stefan Monnier
2010-09-09 21:56                               ` Lars Magne Ingebrigtsen
2010-09-09 22:28                                 ` Stefan Monnier
2010-09-09 22:37                                   ` Lars Magne Ingebrigtsen
2010-09-10  8:14                                     ` Andreas Schwab
2010-09-10 10:46                                       ` Stefan Monnier
2010-09-10 10:56                                         ` Lars Magne Ingebrigtsen
2010-09-10 12:37                                           ` Lars Magne Ingebrigtsen
2010-09-10 16:47                                             ` Lars Magne Ingebrigtsen
2010-09-10 16:54                                               ` Lars Magne Ingebrigtsen
2010-09-10 17:05                                                 ` Ted Zlatanov
2010-09-10 17:14                                                   ` Lars Magne Ingebrigtsen
2010-09-10 17:34                                                 ` Glenn Morris
2010-09-10 17:41                                                   ` Glenn Morris
2010-09-10 17:44                                                     ` Lars Magne Ingebrigtsen
2010-09-10 18:39                                                       ` Ted Zlatanov
2010-09-12 16:56                                                       ` Andreas Schwab
2010-09-12 17:05                                                         ` Lars Magne Ingebrigtsen
2010-09-10 21:12                                               ` Chad Brown
2010-09-10 21:40                                                 ` Lars Magne Ingebrigtsen
2010-09-10 22:45                                                   ` chad
2010-09-10 23:19                                                     ` Lars Magne Ingebrigtsen
2010-09-11  7:18                                                       ` Andreas Schwab
2010-09-11 12:48                                                         ` Lars Magne Ingebrigtsen
2010-09-13 18:37                                                 ` Leo
2010-09-13 18:49                                                   ` Lars Magne Ingebrigtsen
2010-09-13 19:08                                                     ` Leo
2010-09-13 19:16                                                   ` Chad Brown
2010-09-13 19:23                                                     ` Chad Brown
2010-09-13 22:24                                                     ` Leo
2010-09-13 16:06                                               ` Christian Faulhammer
2010-09-10 11:37                                         ` Andreas Schwab
2010-09-10 14:12                                         ` Andrew W. Nosenko
2010-09-09  8:35                           ` Christian Faulhammer
2010-09-09 10:33                             ` Lars Magne Ingebrigtsen
2010-09-09 11:07                               ` Christian Faulhammer
2010-09-09 11:09                                 ` Lars Magne Ingebrigtsen
2010-09-06 19:19 ` Chong Yidong
2010-09-06 21:03   ` Lars Magne Ingebrigtsen
2010-09-15  0:55   ` Eric M. Ludlam
2010-09-15 15:52     ` Ted Zlatanov
2010-09-06 21:08 ` Stefan Monnier
2010-09-06 21:17   ` Lars Magne Ingebrigtsen
2010-09-06 21:30     ` joakim
2010-09-07  1:40     ` Chad Brown
2010-09-07  1:47       ` Lars Magne Ingebrigtsen
2010-09-06 21:18   ` Lennart Borgman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).