From mboxrd@z Thu Jan 1 00:00:00 1970 From: Leo Famulari Subject: Re: [PATCH 3/4] gnu: Add Swish-e. Date: Tue, 23 Aug 2016 16:46:51 -0400 Message-ID: <20160823204651.GA23118@jasmine> References: <20160823061512.13024-1-ericbavier@openmailbox.org> <20160823061512.13024-3-ericbavier@openmailbox.org> <20160823062726.GA6297@jasmine> <20160823014912.1f80a48d@openmailbox.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:40103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcIbJ-0007VF-D2 for guix-devel@gnu.org; Tue, 23 Aug 2016 16:47:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bcIbD-0003Af-QR for guix-devel@gnu.org; Tue, 23 Aug 2016 16:47:12 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:42891) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcIbB-0003A8-Bi for guix-devel@gnu.org; Tue, 23 Aug 2016 16:47:07 -0400 Content-Disposition: inline In-Reply-To: <20160823014912.1f80a48d@openmailbox.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Eric Bavier Cc: guix-devel@gnu.org On Tue, Aug 23, 2016 at 01:49:12AM -0500, Eric Bavier wrote: > * gnu/packages/search.scm (swish-e): New variable. > * gnu/packages/patches/swish-e-search.patch, > gnu/packages/patches/swish-e-format-security.patch: New patches. > * gnu/local.mk (dist_patch_DATA): Add them. It would be ideal to present these patches to the upstream maintainers, but their site is offline. Do we know if the project is still active? > diff --git a/gnu/packages/patches/swish-e-format-security.patch b/gnu/packages/patches/swish-e-format-security.patch > new file mode 100644 > index 0000000..be9d7cb > --- /dev/null > +++ b/gnu/packages/patches/swish-e-format-security.patch > @@ -0,0 +1,33 @@ > +Borrowed from Debian. > + > +--- swish-e-2.4.7/src/parser.c 2009-04-05 03:58:32.000000000 +0200 > ++++ swish-e-2.4.7/src/parser.c 2013-06-11 13:53:08.196559035 +0200 > +@@ -1760,7 +1760,7 @@ > + va_start(args, msg); > + vsnprintf(str, 1000, msg, args ); > + va_end(args); > +- xmlParserError(parse_data->ctxt, str); > ++ xmlParserError(parse_data->ctxt, "%s", str); > + } > + > + static void warning(void *data, const char *msg, ...) > +@@ -1772,7 +1772,7 @@ > + va_start(args, msg); > + vsnprintf(str, 1000, msg, args ); > + va_end(args); > +- xmlParserWarning(parse_data->ctxt, str); > ++ xmlParserWarning(parse_data->ctxt, "%s", str); > + } My understanding is that xmlParserWarning() is from libxml2, defined in 'xmlerror.h' like this: XMLPUBFUN void XMLCDECL xmlParserWarning (void *ctx, const char *msg, ...) LIBXML_ATTR_FORMAT(2,3); I don't understand this definition very much, but in libxml2 file 'xmlversion.h', LIBXML_ATTR_FORMAT is commented with "Macro used to indicate to GCC the parameter are printf like". Somebody else should review this. > +--- swish-e-2.4.7/src/result_output.c 2009-04-05 03:58:32.000000000 +0200 > ++++ swish-e-2.4.7/src/result_output.c 2013-06-11 13:53:38.593550825 +0200 > +@@ -752,7 +752,7 @@ > + s = (char *) emalloc(MAXWORDLEN + 1); > + n = strftime(s, (size_t) MAXWORDLEN, fmt, localtime(&(pv->value.v_date))); > + if (n && f) > +- fprintf(f, s); > ++ fprintf(f, "%s", s); LGTM > diff --git a/gnu/packages/patches/swish-e-search.patch b/gnu/packages/patches/swish-e-search.patch > new file mode 100644 > index 0000000..2a57a31 > --- /dev/null > +++ b/gnu/packages/patches/swish-e-search.patch > @@ -0,0 +1,43 @@ > +From http://swish-e.org/archive/2015-09/13295.html The site is offline, but I found it on archive.org: https://web.archive.org/web/20150907203848/http://www.swish-e.org/archive/2015-09/13295.html Interestingly, I'm a few blocks the patch author's office :) As far as I can tell, nobody from swish-e ever replied. > + > +--- a/src/compress.c > ++++ a/src/compress.c > +@@ -995,7 +995,7 @@ void remove_worddata_longs(unsigned char *worddata,int *sz_worddata) > + progerr("Internal error in remove_worddata_longs"); > + > + /* dst may be smaller than src. So move the data */ > +- memcpy(dst,src,data_len); > ++ memmove(dst,src,data_len); LGTM > + > + /* Increase pointers */ > + src += data_len; > +--- a/src/headers.c > ++++ a/src/headers.c > +@@ -280,7 +280,7 @@ static SWISH_HEADER_VALUE fetch_single_header( IndexFILE *indexf, HEADER_MAP *he > + > + case SWISH_NUMBER: > + case SWISH_BOOL: > +- value.number = *(unsigned long *) data_pointer; > ++ value.number = *(unsigned int *) data_pointer; Could there be any risk in reducing the size of the variable like this? > + > + /* $$$ Ugly hack alert! */ > + /* correct for removed files */ > +--- a/src/swishspider > ++++ a/src/swishspider > +@@ -27,6 +27,7 @@ use LWP::UserAgent; > + use HTTP::Status; > + use HTML::Parser 3.00; > + use HTML::LinkExtor; > ++use Encode; > + > + if (scalar(@ARGV) != 2) { > + print STDERR "Usage: $0 localpath url\n"; > +@@ -94,7 +95,7 @@ use HTML::LinkExtor; > + # Don't allow links above the base > + $URI::ABS_REMOTE_LEADING_DOTS = 1; > + > +- $p->parse( $$content_ref ); > ++ $p->parse( decode_utf8 $$content_ref ); Can you explain why we need this? > +(define-public swish-e The package definition LGTM. I did not try to build it.