From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Bavier Subject: Re: [PATCH 3/4] gnu: Add Swish-e. Date: Tue, 23 Aug 2016 17:34:45 -0500 Message-ID: <20160823173445.0364aeaa@openmailbox.org> References: <20160823061512.13024-1-ericbavier@openmailbox.org> <20160823061512.13024-3-ericbavier@openmailbox.org> <20160823062726.GA6297@jasmine> <20160823014912.1f80a48d@openmailbox.org> <20160823204651.GA23118@jasmine> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:54182) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcKHY-0001RT-8H for guix-devel@gnu.org; Tue, 23 Aug 2016 18:34:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bcKHU-0002vo-UL for guix-devel@gnu.org; Tue, 23 Aug 2016 18:34:56 -0400 Received: from mail.openmailbox.org ([62.4.1.34]:49457) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcKHU-0002v1-HQ for guix-devel@gnu.org; Tue, 23 Aug 2016 18:34:52 -0400 In-Reply-To: <20160823204651.GA23118@jasmine> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Leo Famulari Cc: guix-devel@gnu.org On Tue, 23 Aug 2016 16:46:51 -0400 Leo Famulari wrote: > On Tue, Aug 23, 2016 at 01:49:12AM -0500, Eric Bavier wrote: > > * gnu/packages/search.scm (swish-e): New variable. > > * gnu/packages/patches/swish-e-search.patch, > > gnu/packages/patches/swish-e-format-security.patch: New patches. > > * gnu/local.mk (dist_patch_DATA): Add them. > > It would be ideal to present these patches to the upstream maintainers, > but their site is offline. Do we know if the project is still active? The last active maintainer stepped out a while ago, and it seems no one else has stepped up: https://web.archive.org/web/20150908004634/http://www.swish-e.org/archive/2014-04/13214.html > > > diff --git a/gnu/packages/patches/swish-e-format-security.patch b/gnu/packages/patches/swish-e-format-security.patch > > new file mode 100644 > > index 0000000..be9d7cb > > --- /dev/null > > +++ b/gnu/packages/patches/swish-e-format-security.patch > > @@ -0,0 +1,33 @@ > > +Borrowed from Debian. > > + > > +--- swish-e-2.4.7/src/parser.c 2009-04-05 03:58:32.000000000 +0200 > > ++++ swish-e-2.4.7/src/parser.c 2013-06-11 13:53:08.196559035 +0200 > > +@@ -1760,7 +1760,7 @@ > > + va_start(args, msg); > > + vsnprintf(str, 1000, msg, args ); > > + va_end(args); > > +- xmlParserError(parse_data->ctxt, str); > > ++ xmlParserError(parse_data->ctxt, "%s", str); > > + } > > + > > + static void warning(void *data, const char *msg, ...) > > +@@ -1772,7 +1772,7 @@ > > + va_start(args, msg); > > + vsnprintf(str, 1000, msg, args ); > > + va_end(args); > > +- xmlParserWarning(parse_data->ctxt, str); > > ++ xmlParserWarning(parse_data->ctxt, "%s", str); > > + } > > My understanding is that xmlParserWarning() is from libxml2, defined in > 'xmlerror.h' like this: > > XMLPUBFUN void XMLCDECL > xmlParserWarning (void *ctx, > const char *msg, > ...) LIBXML_ATTR_FORMAT(2,3); > > I don't understand this definition very much, but in libxml2 file > 'xmlversion.h', LIBXML_ATTR_FORMAT is commented with "Macro used to > indicate to GCC the parameter are printf like". > > Somebody else should review this. > > > +--- swish-e-2.4.7/src/result_output.c 2009-04-05 03:58:32.000000000 +0200 > > ++++ swish-e-2.4.7/src/result_output.c 2013-06-11 13:53:38.593550825 +0200 > > +@@ -752,7 +752,7 @@ > > + s = (char *) emalloc(MAXWORDLEN + 1); > > + n = strftime(s, (size_t) MAXWORDLEN, fmt, localtime(&(pv->value.v_date))); > > + if (n && f) > > +- fprintf(f, s); > > ++ fprintf(f, "%s", s); > > LGTM > > > diff --git a/gnu/packages/patches/swish-e-search.patch b/gnu/packages/patches/swish-e-search.patch > > new file mode 100644 > > index 0000000..2a57a31 > > --- /dev/null > > +++ b/gnu/packages/patches/swish-e-search.patch > > @@ -0,0 +1,43 @@ > > +From http://swish-e.org/archive/2015-09/13295.html > > The site is offline, but I found it on archive.org: > https://web.archive.org/web/20150907203848/http://www.swish-e.org/archive/2015-09/13295.html > > Interestingly, I'm a few blocks the patch author's office :) > > As far as I can tell, nobody from swish-e ever replied. AFAICT that right. > > > + > > +--- a/src/compress.c > > ++++ a/src/compress.c > > +@@ -995,7 +995,7 @@ void remove_worddata_longs(unsigned char *worddata,int *sz_worddata) > > + progerr("Internal error in remove_worddata_longs"); > > + > > + /* dst may be smaller than src. So move the data */ > > +- memcpy(dst,src,data_len); > > ++ memmove(dst,src,data_len); > > LGTM > > > + > > + /* Increase pointers */ > > + src += data_len; > > +--- a/src/headers.c > > ++++ a/src/headers.c > > +@@ -280,7 +280,7 @@ static SWISH_HEADER_VALUE fetch_single_header( IndexFILE *indexf, HEADER_MAP *he > > + > > + case SWISH_NUMBER: > > + case SWISH_BOOL: > > +- value.number = *(unsigned long *) data_pointer; > > ++ value.number = *(unsigned int *) data_pointer; > > Could there be any risk in reducing the size of the variable like this? Assuming the value is indeed a boolean, probably not. > > > + > > + /* $$$ Ugly hack alert! */ > > + /* correct for removed files */ > > +--- a/src/swishspider > > ++++ a/src/swishspider > > +@@ -27,6 +27,7 @@ use LWP::UserAgent; > > + use HTTP::Status; > > + use HTML::Parser 3.00; > > + use HTML::LinkExtor; > > ++use Encode; > > + > > + if (scalar(@ARGV) != 2) { > > + print STDERR "Usage: $0 localpath url\n"; > > +@@ -94,7 +95,7 @@ use HTML::LinkExtor; > > + # Don't allow links above the base > > + $URI::ABS_REMOTE_LEADING_DOTS = 1; > > + > > +- $p->parse( $$content_ref ); > > ++ $p->parse( decode_utf8 $$content_ref ); > > Can you explain why we need this? Presumably to better handle utf8-encoded input. Tomb developers have expressed interest in replacing their use of swish-e with the "Recoll" search tool https://github.com/dyne/Tomb/issues/211. If maintenance of this package turns out to be burdensome, we might be able to drop it. Thanks for reviewing, `~Eric