From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: Extremly slow for format & string-join Date: Mon, 01 Apr 2013 17:52:10 +0800 Organization: HFG Message-ID: <1364809930.4639.17.camel@Renee-desktop.suse> References: <1364788801.4639.6.camel@Renee-desktop.suse> <87obdy3aw9.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1364809948 22129 80.91.229.3 (1 Apr 2013 09:52:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Apr 2013 09:52:28 +0000 (UTC) Cc: guile-devel@gnu.org To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Apr 01 11:52:55 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UMbQ5-0000Gv-KO for guile-devel@m.gmane.org; Mon, 01 Apr 2013 11:52:53 +0200 Original-Received: from localhost ([::1]:57952 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMbPh-00081y-6c for guile-devel@m.gmane.org; Mon, 01 Apr 2013 05:52:29 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:56406) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMbPY-00080R-Hi for guile-devel@gnu.org; Mon, 01 Apr 2013 05:52:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UMbPS-00075s-Pe for guile-devel@gnu.org; Mon, 01 Apr 2013 05:52:20 -0400 Original-Received: from mail-pb0-f42.google.com ([209.85.160.42]:61034) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMbPS-00075Y-Hf for guile-devel@gnu.org; Mon, 01 Apr 2013 05:52:14 -0400 Original-Received: by mail-pb0-f42.google.com with SMTP id up7so463485pbc.29 for ; Mon, 01 Apr 2013 02:52:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:subject:from:to:cc:date:in-reply-to :references:organization:content-type:x-mailer:mime-version :content-transfer-encoding; bh=82cVSUZ7F4XTS+6MtnOXgq0/cc/JhxtRMB4yPFqi0GU=; b=u5KPLRbf1lrFoSKKAJWKAQLquyLra6I2HHrQOnj6Px+DnyL4vRCabCdK6EPfYgZ+k0 TFBexgf+xvwaSKkawBELY/sDaQM+jOm2o+mGMM5od0t2vPqBAaOxJ4tX0bI2SGOcfodh 0/ZKtnzzYihYhd5fFnk6nHMq6KEusMfmu0qq9E2SofY7QJyoQpUp/93exIffNTwoPZnA tF2RZkcSHcI+BiAIZvhxLG9tUfQ6pjgqaBA/kUVqt2yYCnjyp5IwQBL2DwfGyV2MnbmF afIQvgIl+GVUmOlbFzUwPzadIYszvM8Dxit08SG+ND2gdU5+FsPr6ZeDGAMzwvGEL9bP wdNw== X-Received: by 10.66.123.5 with SMTP id lw5mr18126608pab.132.1364809933746; Mon, 01 Apr 2013 02:52:13 -0700 (PDT) Original-Received: from [147.2.147.112] ([61.14.130.226]) by mx.google.com with ESMTPS id i9sm14699513paa.7.2013.04.01.02.52.11 (version=SSLv3 cipher=RC4-SHA bits=128/128); Mon, 01 Apr 2013 02:52:12 -0700 (PDT) In-Reply-To: <87obdy3aw9.fsf@tines.lan> X-Mailer: Evolution 3.4.4 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 209.85.160.42 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16089 Archived-At: On Mon, 2013-04-01 at 04:36 -0400, Mark H Weaver wrote: > Nala Ginrut writes: > > > I've tried to implement a function to mimic string multiply like Python: > > "asdf" * 10 > > > > --------------code---------------- > > (define (str* str n) > > (format #f "~{~a~}" (make-list n str))) > > > > or > > > > (define (str* str n) > > (string-join (make-list n str) "")) > > --------------end----------------- > > > > > > Both are very slow when N is large (> 1000000). > > Indeed, the implementation of 'string-join' was very bad: about O(n^2) > in the length of the list (assuming that the strings are roughly the > same length). Thanks for bringing this to my attention. The problem > was that it called 'string-append' repeatedly, adding one component at a > time to the result string. Since each call to 'string-append' copied > the source strings into a fresh new string, this resulted in a lot of > unnecessary copying and allocation. > > I just pushed a much faster O(n) implementation to stable-2.0, which > instead constructs a list of strings, and then calls 'string-append' > only once. > > http://git.savannah.gnu.org/gitweb/?p=guile.git;a=commit;h=786ab4258fbf605f46287da5e7550d3ab4b68589 > > On my system, this makes (string-join (make-list 100000 "test") "-") > over 3000 times faster (about 28.5 milliseconds vs about 98 seconds). > I expect that the same test with 1,000,000 elements would be about > 30,000 times faster (roughly 2.7 hours vs 0.3 seconds), but I didn't > have the patience to wait 2.7 hours to verify this :) > Thanks Mark! string-join is a common thing for text processing(include web develop). However, our powerful 'format' is not so efficient. I do think it's necessary to we spend some time on it. > Before: > > scheme@(guile-user)> ,time (define s (string-join (make-list 10000 "test") "-")) > ;; 0.998800s real time, 0.996677s run time. 0.984885s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 100000 "test") "-")) > ;; 98.006569s real time, 97.817077s run time. 97.795970s spent in GC. > > After: > > scheme@(guile-user)> ,time (define s (string-join (make-list 10000 "test") "-")) > ;; 0.006362s real time, 0.006351s run time. 0.000000s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 100000 "test") "-")) > ;; 0.028513s real time, 0.028457s run time. 0.022235s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 1000000 "test") "-")) > ;; 0.303098s real time, 0.302543s run time. 0.289639s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 10000000 "test") "-")) > ;; 3.288105s real time, 3.281922s run time. 3.174460s spent in GC. > > Format is still slow for large numbers of elements, but I'm not > sufficiently motivated to dive into that swamp right now. > > Thanks, > Mark