From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dmitry Antipov Newsgroups: gmane.emacs.devel Subject: Re: More (de)compress? Date: Tue, 20 Aug 2013 12:19:35 +0400 Message-ID: <52132697.4050000@yandex.ru> References: <52120CEA.6060701@yandex.ru> <52124ACD.20502@cs.ucla.edu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1376986794 13824 80.91.229.3 (20 Aug 2013 08:19:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 20 Aug 2013 08:19:54 +0000 (UTC) Cc: Emacs development discussions To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 20 10:19:57 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VBhAS-0000pB-8f for ged-emacs-devel@m.gmane.org; Tue, 20 Aug 2013 10:19:56 +0200 Original-Received: from localhost ([::1]:46574 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBhAR-0002yd-Pf for ged-emacs-devel@m.gmane.org; Tue, 20 Aug 2013 04:19:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBhAK-0002yN-6q for emacs-devel@gnu.org; Tue, 20 Aug 2013 04:19:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VBhAE-0004Xo-3i for emacs-devel@gnu.org; Tue, 20 Aug 2013 04:19:48 -0400 Original-Received: from forward8.mail.yandex.net ([77.88.61.38]:44079) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBhAD-0004XP-L9 for emacs-devel@gnu.org; Tue, 20 Aug 2013 04:19:42 -0400 Original-Received: from smtp7.mail.yandex.net (smtp7.mail.yandex.net [77.88.61.55]) by forward8.mail.yandex.net (Yandex) with ESMTP id 51D1CF605BB; Tue, 20 Aug 2013 12:19:38 +0400 (MSK) Original-Received: from smtp7.mail.yandex.net (localhost [127.0.0.1]) by smtp7.mail.yandex.net (Yandex) with ESMTP id 1E3241580760; Tue, 20 Aug 2013 12:19:38 +0400 (MSK) Original-Received: from 114.gprs.mts.ru (114.gprs.mts.ru [213.87.134.114]) by smtp7.mail.yandex.net (nwsmtp/Yandex) with ESMTP id Ff8Xc9mZpT-Jas80LL4; Tue, 20 Aug 2013 12:19:37 +0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1376986777; bh=uZGYp8NotSPE0qEXCSQGnqzWcKhpJU52O/14DNvOIo0=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=JozqtH0KeoZ2Xz6FhFIutuobOe6AK74X6XhVbmG/cJL8B/QRxVnzKeWMyUTsvBPt9 k27k5Tlih1gfsSIn2+PEsSTnQKbocbBrXpSvUYq6Of6NPIHcn9zG1pu75B+Gv41fzU 64IwIP55RK99JLDyuzOZW0NX9bw+fWYhMbK6BqlQ= Authentication-Results: smtp7.mail.yandex.net; dkim=pass header.i=@yandex.ru User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 In-Reply-To: <52124ACD.20502@cs.ucla.edu> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 77.88.61.38 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:162906 Archived-At: On 08/19/2013 08:41 PM, Paul Eggert wrote: > * It can be faster to compress using an external program, > since the compression can be done in parallel. Have you > timed your compression approach on a multicore platform, > and compared its real time to doing it with external > compression? (Similarly for decompression, though I > expect there we won't find the external program faster.) > You might try "pigz" for compression, since it's multicore > internally. It's faster because the buffer machinery is slower than external compression program's input reader (at least, in case of gzip). I tried to compress 959 small text files (~16Mb in total) with 'gzip *.txt' (0.67s), dired-compress-file (6.15s, but don't forget about fork+exec overhead) and simple ad-hoc function using compress-region and zlib method: (defun compress-file (name method) (message "Compress %s" name) (let ((ext (cdr (assoc method '((zlib . "gz") (bzlib . "bz2") (lzma . "xz"))))) (buffer (find-file-literally name))) (when (null ext) (error "Unsupported compression method '%S'" method)) (save-excursion (set-buffer buffer) (compress-region method) (delete-file (buffer-file-name)) (rename-buffer (concat (buffer-name) "." ext)) (write-file (concat (buffer-file-name) "." ext)) (kill-buffer)))) The latter version deliberately takes ~19s. Unfortunately internal compression support can't replace calls to external programs, especially in batch operations where we need to (de)compress multiple files at once. But internal compression should have some advantages when we just need to show the contents of compressed buffer (I didn't try to check this yet, BTW). > * There seems to be quite a bit of repetition in configure.ac > and in the C code -- each compression package does pretty > much the same thing with respect to allocating buffers, > saving point, etc. Could this be factored out to simplify > the code and make it easier to add future compression > algorithms? Yes. > * bzlib_detect and lzm_detect mishandle the case where the > buffer gap is located very near the start of the buffer. Argh, yes. > * If the buffer contains random garbage, > (decompress-region nil 1 100000) > signals "Unsupported decompression method", which > isn't very clear. It should signal something like > "Unknown compression format". > > * The functions compress-region and decompress-region > should be defined on all platforms, even those that > lack all compression libraries. They'll simply return > nil on such platforms, since they can't compress or > decompress anything. This simplifies the C code and > will simplify Lisp code too. OK. Dmitry