From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Magne Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: Linking Emacs with libxml2 Date: Wed, 08 Sep 2010 18:15:25 +0200 Organization: Programmerer Ingebrigtsen Message-ID: References: <8A20526E-44B3-4434-9D40-54A36F976CD6@mit.edu> <4C85892B.5080105@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1283962565 13964 80.91.229.12 (8 Sep 2010 16:16:05 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 8 Sep 2010 16:16:05 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 08 18:16:01 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OtNJW-0004jp-FX for ged-emacs-devel@m.gmane.org; Wed, 08 Sep 2010 18:15:59 +0200 Original-Received: from localhost ([127.0.0.1]:40929 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OtNJV-0004DS-Je for ged-emacs-devel@m.gmane.org; Wed, 08 Sep 2010 12:15:57 -0400 Original-Received: from [140.186.70.92] (port=47921 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OtNJL-0004Av-TJ for emacs-devel@gnu.org; Wed, 08 Sep 2010 12:15:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OtNJF-0001Of-Be for emacs-devel@gnu.org; Wed, 08 Sep 2010 12:15:47 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:41247) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OtNJE-0001OG-Rc for emacs-devel@gnu.org; Wed, 08 Sep 2010 12:15:41 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OtNJC-0004W3-Lb for emacs-devel@gnu.org; Wed, 08 Sep 2010 18:15:38 +0200 Original-Received: from cm-84.215.34.171.getinternet.no ([84.215.34.171]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 08 Sep 2010 18:15:38 +0200 Original-Received: from larsi by cm-84.215.34.171.getinternet.no with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 08 Sep 2010 18:15:38 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 405 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cm-84.215.34.171.getinternet.no Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAIVBMVEVNLRC2gSonAAYoFApq RRIUAwYrAAj////+//4XBwfgwoa8g8LXAAACR0lEQVQ4jW2UMWvdMBDH7yE1kE4+Ymjw+IbMKnIC 3iQsQuhUCBq0PZrivVB4plOHQEymguNBWwyFBH3K3tl5ee5LbhDmfnf/k04nQ/OefW+O4D3/5uPY ENi8JU8tZ0iXH4K7bcYZrj6UemoFNJtj9abG1T1JOTdvYAMgdkRcUnFpj/hbYm6McZczOSKwchxO blFp5fCF3EBjGVgjBu99AIcgLSKmF31TBc8WlCtSGjElkM4h5npy+zgEaQkQhJwjsmGOj9H3Mk0G 5E9j7V9BUF9SKtIItKSs8i9kCL63DCiDFjP4vfUlSRBgJbEAMajEKUB0FIuEGFUxFact/9XBx1cw /Cmeu65lqccqxEXG19Mt2ZQxhN05hqhjPwMSfCQw+BBir7UArWbAUufBUxWtwDlnseh2IDvXygCZ kUyutjuQyMeXwYyaejqDjhJvDVBTqAKb/DQDXm4NN6uXSmsnVs978Jt7EnpZx364fliAn9z2oO3J 0IN4eNqDHx+uY4jK5pFOuAS/hNMxlohVjLpcguw4VPEzYh2r6O4WoC5N1a8QT3ql7ATaGeRaV8ry tCj9PxC6lzRmaCq9WoA2d8clzx+iqJYZLebSSScUpWi867qO+jcDRAdC6ymLZ2EsJrBFzOh4wBtD GraxQAbtfcJcsf68gSLRMsI9zXZBwLE+S1E4G+BsdIPO8NzzZVmsL+h93PADoSvN6V6nV1fa/KJ5 589wAyU9OFg3zfrs2xKssKZwIO96vT5bgHL6UxAjcKCVMaDwA/9s/wBAqowFFvMlsgAAAABJRU5E rkJggg== Mail-Copies-To: never X-Now-Playing: Archie Shepp's _Gemini: Live in Souillac_: "Do you want to be saved" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:gyqAHmQkoyblKmP7FOqcQikL6qw= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:129790 Archived-At: --=-=-= Content-Type: text/plain I did it the hard way: --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=libxml.diff === modified file 'ChangeLog' --- ChangeLog 2010-09-04 07:30:14 +0000 +++ ChangeLog 2010-09-08 16:12:36 +0000 @@ -1,3 +1,7 @@ +2010-09-08 Lars Magne Ingebrigtsen + + * configure.in: Check for libxml2/htmlReadMemory(). + 2010-09-04 Eli Zaretskii * config.bat: Produce lisp/gnus/_dir-locals.el from === modified file 'configure' --- configure 2010-08-23 12:54:09 +0000 +++ configure 2010-09-08 15:55:18 +0000 @@ -660,6 +660,8 @@ LIBS_MAIL liblockfile ALLOCA +LIBXML2_CFLAGS +LIBXML2_LIBS LIBXSM LIBGPM LIBGIF @@ -11070,6 +11072,74 @@ fi +### Use libxml2 (-lxml2) if available +HAVE_LIBXML2=no +LIBXML2_LIBS= +if test -n xml2-config; then + LIBXML2_CFLAGS="`xml2-config --cflags`" + SAVE_CFLAGS="$CFLAGS" + CFLAGS="$LIBXML2_CFLAGS $CFLAGS" + ac_fn_c_check_header_mongrel "$LINENO" "libxml/xmlexports.h" "ac_cv_header_libxml_xmlexports_h" "$ac_includes_default" +if test "x$ac_cv_header_libxml_xmlexports_h" = x""yes; then : + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for htmlReadMemory in -lxml2" >&5 +$as_echo_n "checking for htmlReadMemory in -lxml2... " >&6; } +if test "${ac_cv_lib_xml2_htmlReadMemory+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + ac_check_lib_save_LIBS=$LIBS +LIBS="-lxml2 -lxml2 $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char htmlReadMemory (); +int +main () +{ +return htmlReadMemory (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_cv_lib_xml2_htmlReadMemory=yes +else + ac_cv_lib_xml2_htmlReadMemory=no +fi +rm -f core conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_xml2_htmlReadMemory" >&5 +$as_echo "$ac_cv_lib_xml2_htmlReadMemory" >&6; } +if test "x$ac_cv_lib_xml2_htmlReadMemory" = x""yes; then : + HAVE_LIBXML2=yes +fi + +fi + + + + if test "${HAVE_LIBXML2}" = "yes"; then + +$as_echo "#define HAVE_LIBXML2 1" >>confdefs.h + + LIBXML2_LIBS="-lxml2" + case "$LIBS" in + *-lxml2*) ;; + *) LIBS="$LIBXML2_LIBS $LIBS" ;; + esac + fi + CFLAGS="$SAVE_CFLAGS" +fi + + + # If netdb.h doesn't declare h_errno, we must declare it by hand. { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether netdb declares h_errno" >&5 $as_echo_n "checking whether netdb declares h_errno... " >&6; } === modified file 'configure.in' --- configure.in 2010-08-23 12:54:09 +0000 +++ configure.in 2010-09-08 15:55:38 +0000 @@ -2535,6 +2535,29 @@ fi AC_SUBST(LIBXSM) +### Use libxml2 (-lxml2) if available +HAVE_LIBXML2=no +LIBXML2_LIBS= +if test -n xml2-config; then + LIBXML2_CFLAGS="`xml2-config --cflags`" + SAVE_CFLAGS="$CFLAGS" + CFLAGS="$LIBXML2_CFLAGS $CFLAGS" + AC_CHECK_HEADER(libxml/xmlversion.h, + [AC_CHECK_LIB(xml2, htmlReadMemory, HAVE_LIBXML2=yes, , -lxml2)]) + + if test "${HAVE_LIBXML2}" = "yes"; then + AC_DEFINE(HAVE_LIBXML2, 1, [Define to 1 if you have the libxml2 library (-lxml2).]) + LIBXML2_LIBS="-lxml2" + case "$LIBS" in + *-lxml2*) ;; + *) LIBS="$LIBXML2_LIBS $LIBS" ;; + esac + fi + CFLAGS="$SAVE_CFLAGS" +fi +AC_SUBST(LIBXML2_LIBS) +AC_SUBST(LIBXML2_CFLAGS) + # If netdb.h doesn't declare h_errno, we must declare it by hand. AC_CACHE_CHECK(whether netdb declares h_errno, emacs_cv_netdb_declares_h_errno, === modified file 'src/ChangeLog' --- src/ChangeLog 2010-09-05 02:06:39 +0000 +++ src/ChangeLog 2010-09-08 16:12:09 +0000 @@ -1,3 +1,9 @@ +2010-09-08 Lars Magne Ingebrigtsen + + * xml.c: New file. + (Fhtml_parse_buffer): New function to interface to the libxml2 + html parsing function. + 2010-09-05 Juanma Barranquero * biditype.h: Regenerate. === modified file 'src/Makefile.in' --- src/Makefile.in 2010-08-17 21:19:11 +0000 +++ src/Makefile.in 2010-09-08 15:52:01 +0000 @@ -226,6 +226,9 @@ IMAGEMAGICK_LIBS= @IMAGEMAGICK_LIBS@ IMAGEMAGICK_CFLAGS= @IMAGEMAGICK_CFLAGS@ +LIBXML2_LIBS = @LIBXML2_LIBS@ +LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ + ## widget.o if USE_X_TOOLKIT, otherwise empty. WIDGET_OBJ=@WIDGET_OBJ@ @@ -320,7 +323,8 @@ ## FIXME? MYCPPFLAGS only referenced in etc/DEBUG. ALL_CFLAGS=-Demacs -DHAVE_CONFIG_H $(MYCPPFLAGS) -I. -I${srcdir} \ ${C_SWITCH_MACHINE} ${C_SWITCH_SYSTEM} ${C_SWITCH_X_SITE} \ - ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} ${DBUS_CFLAGS} \ + ${C_SWITCH_X_SYSTEM} ${CFLAGS_SOUND} ${RSVG_CFLAGS} ${IMAGEMAGICK_CFLAGS} \ + ${LIBXML2_CFLAGS} ${DBUS_CFLAGS} \ ${GCONF_CFLAGS} ${FREETYPE_CFLAGS} ${FONTCONFIG_CFLAGS} \ ${LIBOTF_CFLAGS} ${M17N_FLT_CFLAGS} ${DEPFLAGS} ${PROFILING_CFLAGS} \ ${C_WARNINGS_SWITCH} ${CFLAGS} @@ -349,7 +353,7 @@ syntax.o $(UNEXEC_OBJ) bytecode.o \ process.o callproc.o \ region-cache.o sound.o atimer.o \ - doprnt.o strftime.o intervals.o textprop.o composite.o md5.o \ + doprnt.o strftime.o intervals.o textprop.o composite.o md5.o xml.o \ $(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) ## Object files used on some machine or other. @@ -595,7 +599,8 @@ ## duplicated symbols. If the standard libraries were compiled ## with GCC, we might need LIB_GCC again after them. LIBES = $(LIBS) $(LIBX_BASE) $(LIBX_OTHER) $(LIBSOUND) \ - $(RSVG_LIBS) ${IMAGEMAGICK_LIBS} $(DBUS_LIBS) $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \ + $(RSVG_LIBS) ${IMAGEMAGICK_LIBS} $(DBUS_LIBS) \ + ${LIBXML2_LIBS} $(LIBGPM) $(LIBRESOLV) $(LIBS_SYSTEM) \ $(LIBS_TERMCAP) $(GETLOADAVG_LIBS) ${GCONF_LIBS} ${LIBSELINUX_LIBS} \ $(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \ $(LIB_GCC) $(LIB_MATH) $(LIB_STANDARD) $(LIB_GCC) === modified file 'src/config.in' --- src/config.in 2010-08-17 21:19:11 +0000 +++ src/config.in 2010-09-08 15:37:34 +0000 @@ -813,6 +813,9 @@ /* Define to 1 if you have the SM library (-lSM). */ #undef HAVE_X_SM +/* Define to 1 if you have the libxml2 library (-lxml2). */ +#undef HAVE_LIBXML2 + /* Define to 1 if you want to use the X window system. */ #undef HAVE_X_WINDOWS === modified file 'src/emacs.c' --- src/emacs.c 2010-08-22 21:15:20 +0000 +++ src/emacs.c 2010-09-08 13:39:17 +0000 @@ -1543,6 +1543,7 @@ syms_of_xselect (); #endif #endif /* HAVE_X_WINDOWS */ + syms_of_xml (); syms_of_menu (); === modified file 'src/lisp.h' --- src/lisp.h 2010-08-09 19:25:41 +0000 +++ src/lisp.h 2010-09-08 13:40:50 +0000 @@ -3559,6 +3559,9 @@ /* Defined in xsmfns.c */ extern void syms_of_xsmfns (void); +/* Defined in xml.c */ +extern void syms_of_xml (void); + /* Defined in xselect.c */ EXFUN (Fx_send_client_event, 6); extern void syms_of_xselect (void); === added file 'src/xml.c' --- src/xml.c 1970-01-01 00:00:00 +0000 +++ src/xml.c 2010-09-08 16:10:36 +0000 @@ -0,0 +1,131 @@ +/* Interface to libxml2. + Copyright (C) 2010 Free Software Foundation, Inc. + +This file is part of GNU Emacs. + +GNU Emacs is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs. If not, see . */ + +#include + +#ifdef HAVE_LIBXML2 + +#include +#include +#include +#include +#include +#include + +#include "lisp.h" +#include "systime.h" +#include "sysselect.h" +#include "frame.h" +#include "buffer.h" + +Lisp_Object make_dom (xmlNode *node) +{ + Lisp_Object result = Qnil; + xmlNode *child; + xmlAttr *property; + + if (node != NULL) { + result = Fcons (Fintern (build_string (node->name), + Vobarray), + Qnil); + property = node->properties; + while (property != NULL) { + if (property->children && + property->children->content) { + char *pname = xmalloc(strlen(property->name) + 2); + *pname = ':'; + strcpy(pname + 1, property->name); + result = Fcons (Fcons (Fintern (build_string (pname), Vobarray), + build_string(property->children->content)), + result); + xfree (pname); + } + property = property->next; + } + child = node->children; + while (child != NULL) { + result = Fcons (make_dom (child), result); + child = child->next; + } + if (node->content) + result = Fcons (Fcons (Fintern (build_string ("text"), Vobarray), + build_string(node->content)), + result); + } + return Fnreverse(result); +} + +DEFUN ("html-parse-buffer", Fhtml_parse_buffer, Shtml_parse_buffer, + 0, 1, 0, + doc: /* Parse the buffer as an HTML document and return the parse tree.*/) + (Lisp_Object object) +{ + xmlDoc *doc; + struct buffer *buffer; + xmlNode *node; + unsigned char *string, *s; + Lisp_Object result; + int ibeg, iend; + + LIBXML_TEST_VERSION + + if (NILP (object)) + buffer = current_buffer; + else { + CHECK_BUFFER (object); + buffer = XBUFFER (object); + } + + ibeg = CHAR_TO_BYTE (XFASTINT (Fpoint_min ())); + iend = CHAR_TO_BYTE (XFASTINT (Fpoint_max ())); + move_gap_both (XFASTINT (Fpoint_min ()), ibeg); + + string = (unsigned char *) xmalloc (iend - ibeg + 1); + s = string; + + while (ibeg < iend) { + *s++ = *(BYTE_POS_ADDR (ibeg)); + ibeg++; + } + *s = 0; + + doc = htmlReadMemory (string, strlen(string), "", "utf-8", 0); + + if (doc == NULL) + return Qnil; + + node = xmlDocGetRootElement (doc); + result = make_dom (node); + + xmlFreeDoc(doc); + xmlCleanupParser(); + + return result; +} + + +/*********************************************************************** + Initialization + ***********************************************************************/ +void +syms_of_xml (void) +{ + defsubr (&Shtml_parse_buffer); +} + +#endif /* HAVE_LIBXML2 */ --=-=-= Content-Type: text/plain This compiles and works for me, but I'm not really an Emacs internals expert. Ahem. Or an autoconf one, for that matter. ./configure finds the stuff it's looking for, but I get this warning: ------- [larsi@quimbies ~/src/emacs/trunk]$ ./configure | grep xml checking libxml/xmlversion.h usability... yes checking libxml/xmlversion.h presence... no configure: WARNING: libxml/xmlversion.h: accepted by the compiler, rejected by the preprocessor! configure: WARNING: libxml/xmlversion.h: proceeding with the compiler's result checking for libxml/xmlversion.h... yes checking for htmlReadMemory in -lxml2... yes ------- I'm not sure what that means... -- (domestic pets only, the antidote for overdose, milk.) larsi@gnus.org * Lars Magne Ingebrigtsen --=-=-=--