From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: More (de)compress? Date: Mon, 19 Aug 2013 09:41:49 -0700 Organization: UCLA Computer Science Department Message-ID: <52124ACD.20502@cs.ucla.edu> References: <52120CEA.6060701@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1376930525 29540 80.91.229.3 (19 Aug 2013 16:42:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 19 Aug 2013 16:42:05 +0000 (UTC) Cc: Emacs development discussions To: Dmitry Antipov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Aug 19 18:42:08 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VBSWs-00081G-OF for ged-emacs-devel@m.gmane.org; Mon, 19 Aug 2013 18:42:06 +0200 Original-Received: from localhost ([::1]:44108 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBSWs-000817-Dr for ged-emacs-devel@m.gmane.org; Mon, 19 Aug 2013 12:42:06 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39244) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBSWl-000810-2W for emacs-devel@gnu.org; Mon, 19 Aug 2013 12:42:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VBSWe-0004uh-Vt for emacs-devel@gnu.org; Mon, 19 Aug 2013 12:41:59 -0400 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:35584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBSWe-0004uM-Ni for emacs-devel@gnu.org; Mon, 19 Aug 2013 12:41:52 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E412FA60004; Mon, 19 Aug 2013 09:41:51 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XEi9AP1YGg7R; Mon, 19 Aug 2013 09:41:50 -0700 (PDT) Original-Received: from [192.168.1.9] (pool-71-108-49-126.lsanca.fios.verizon.net [71.108.49.126]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2F13CA60001; Mon, 19 Aug 2013 09:41:50 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 In-Reply-To: <52120CEA.6060701@yandex.ru> X-Enigmail-Version: 1.5.2 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 131.179.128.62 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:162886 Archived-At: Thanks for taking this on. Some comments on the patch, in addition to Eli's: * It can be faster to compress using an external program, since the compression can be done in parallel. Have you timed your compression approach on a multicore platform, and compared its real time to doing it with external compression? (Similarly for decompression, though I expect there we won't find the external program faster.) You might try "pigz" for compression, since it's multicore internally. * There seems to be quite a bit of repetition in configure.ac and in the C code -- each compression package does pretty much the same thing with respect to allocating buffers, saving point, etc. Could this be factored out to simplify the code and make it easier to add future compression algorithms? * bzlib_detect and lzm_detect mishandle the case where the buffer gap is located very near the start of the buffer. * If the buffer contains random garbage, (decompress-region nil 1 100000) signals "Unsupported decompression method", which isn't very clear. It should signal something like "Unknown compression format". * The functions compress-region and decompress-region should be defined on all platforms, even those that lack all compression libraries. They'll simply return nil on such platforms, since they can't compress or decompress anything. This simplifies the C code and will simplify Lisp code too.