unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Performance
@ 2010-06-18 20:50 Cecil Westerhof
  2010-06-19  9:16 ` Performance Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Cecil Westerhof @ 2010-06-18 20:50 UTC (permalink / raw)
  To: guile-user

I have the following code:
    #!/usr/bin/guile \
    -e main -s
    !#
    (use-modules (ice-9 rdelim))
    (use-modules (ice-9 regex))

    (define (main args)
      (let* ((arg-vector       (list->vector args))
             (input-file-name  (vector-ref   arg-vector 1))
             (output-file-name (vector-ref   arg-vector 2))
             (reg-exp          (make-regexp  (vector-ref   arg-vector 3)))
             (substitute-str   (vector-ref   arg-vector 4))

             (end-match        0)
             (found-match      #f)
             (input-file       (open-file input-file-name  "r"))
             (match-length     0)
             (output-file      (open-file output-file-name "w"))
             (start-match      0)
             (this-line        ""))
        (while (not (eof-object? (peek-char input-file)))
               (set! this-line   (read-line input-file))
    #!
               (set! found-match (regexp-exec reg-exp this-line))
               (while found-match
                      (set! start-match  (match:start found-match))
                      (set! end-match    (match:end found-match))
                      (set! match-length (- end-match start-match))
                      (while (> match-length (string-length substitute-str))
                             (set! substitute-str (string-append substitute-str substitute-str)))
                      (set! found-match  (regexp-exec reg-exp this-line (+ end-match 1))))
    !#
               (write-line this-line output-file))
        (close-port output-file)
        (close-port input-file)))

When running like this it takes less then 20 seconds to process a
5.000.000 line file.

When also executing:
               (set! found-match (regexp-exec reg-exp this-line))
it takes 33 seconds.

When also executing:
               (while found-match
                      (set! start-match  (match:start found-match))
                      (set! end-match    (match:end found-match))
and:
                      (set! found-match  (regexp-exec reg-exp this-line (+ end-match 1))))
it takes 75 seconds.

And when executing all the code, it takes 95 seconds.

Why is this so expensive? I was thinking that Guile was very efficient,
but when not just copying, it becomes much slower. Am I doing something
wrong?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-18 20:50 Performance Cecil Westerhof
@ 2010-06-19  9:16 ` Andy Wingo
  2010-06-19 15:05   ` Performance Cecil Westerhof
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2010-06-19  9:16 UTC (permalink / raw)
  To: Cecil Westerhof; +Cc: guile-user

On Fri 18 Jun 2010 22:50, Cecil Westerhof <Cecil@decebal.nl> writes:

> Why is this so expensive?

The general answer to this question can be found by profiling. You
should factor your code into a function, then from the repl:

  ,profile (call-my-function)

I wonder, perhaps we should have a --profile command-line flag...

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-19  9:16 ` Performance Andy Wingo
@ 2010-06-19 15:05   ` Cecil Westerhof
  2010-06-19 15:44     ` Performance Thien-Thi Nguyen
  2010-06-19 18:16     ` Performance Andy Wingo
  0 siblings, 2 replies; 11+ messages in thread
From: Cecil Westerhof @ 2010-06-19 15:05 UTC (permalink / raw)
  To: guile-user

Op zaterdag 19 jun 2010 11:16 CEST schreef Andy Wingo:

> On Fri 18 Jun 2010 22:50, Cecil Westerhof <Cecil@decebal.nl> writes:
>
>> Why is this so expensive?
>
> The general answer to this question can be found by profiling. You
> should factor your code into a function, then from the repl:
>
> ,profile (call-my-function)
>
> I wonder, perhaps we should have a --profile command-line flag...

When calling my script with:
    dummy.scm "temp/input" "dummy.log" "^ +" "1234567890"
it is just executed.

When starting Guile, I give:
    (load "bin/dummy.scm")
and then:
    (main "temp/input" "dummy.log" "^ +" "1234567890")
and I get:
    Backtrace:
    In standard input:
      10: 0* [main "temp/input" "dummy.log" "^ +" "1234567890"]

    standard input:10:1: In procedure main in expression (main "temp/input" "dummy.log" ...):
    standard input:10:1: Wrong number of arguments to #<procedure main (args)>
    ABORT: (wrong-number-of-args)

when I use:
    (main ("temp/input" "dummy.log" "^ +" "1234567890"))
I get:
    Backtrace:
    In standard input:
      11: 0* [main ...
      11: 1*  ["temp/input" "dummy.log" "^ +" "1234567890"]

    standard input:11:7: In expression ("temp/input" "dummy.log" "^ +" ...):
    standard input:11:7: Wrong type to apply: "temp/input"
    ABORT: (misc-error)

So how should I call it from the REPL?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-19 15:05   ` Performance Cecil Westerhof
@ 2010-06-19 15:44     ` Thien-Thi Nguyen
  2010-06-21  9:41       ` Performance Cecil Westerhof
  2010-06-19 18:16     ` Performance Andy Wingo
  1 sibling, 1 reply; 11+ messages in thread
From: Thien-Thi Nguyen @ 2010-06-19 15:44 UTC (permalink / raw)
  To: Cecil Westerhof; +Cc: guile-user

() Cecil Westerhof <Cecil@decebal.nl>
() Sat, 19 Jun 2010 17:05:50 +0200

       (main ("temp/input" "dummy.log" "^ +" "1234567890"))

To answer this, you can try the following experiment:

  $ cat > program <<EOF
  (define (main args) (write args) (newline) (exit #t))
  (main (command-line))
  EOF
  $ guile -s program some args

Re performance, take a look at the lower-level procedures used to
implement the high-level ‘read-line’.  The lowest ones require an
explicit buffer to be passed in by the caller.  If you modify your
program to use these, you can control the timing and frequency of
that buffer's allocation, and thus improve the program's performance.

thi



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-19 15:05   ` Performance Cecil Westerhof
  2010-06-19 15:44     ` Performance Thien-Thi Nguyen
@ 2010-06-19 18:16     ` Andy Wingo
  2010-06-21  9:48       ` Performance Cecil Westerhof
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2010-06-19 18:16 UTC (permalink / raw)
  To: Cecil Westerhof; +Cc: guile-user

Hello,

On Sat 19 Jun 2010 17:05, Cecil Westerhof <Cecil@decebal.nl> writes:

> Op zaterdag 19 jun 2010 11:16 CEST schreef Andy Wingo:
>
>> ,profile (call-my-function)
>
>     (main ("temp/input" "dummy.log" "^ +" "1234567890"))

Almost. At the repl, type:

 (load "dummy.scm")

Then:

 ,profile (main '("dummy.scm" "temp/input" "dummy.log" "^ +" "1234567890"))

You were missing a quote, and the equivalent of argv[0].

More repl help is available by typing ,help. ,help profile, for example.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-19 15:44     ` Performance Thien-Thi Nguyen
@ 2010-06-21  9:41       ` Cecil Westerhof
  0 siblings, 0 replies; 11+ messages in thread
From: Cecil Westerhof @ 2010-06-21  9:41 UTC (permalink / raw)
  To: guile-user

Again, because the first I only send to Thien-Thi instead of to the
mailing list.

Op zaterdag 19 jun 2010 17:44 CEST schreef Thien-Thi Nguyen:

> Re performance, take a look at the lower-level procedures used to
> implement the high-level ‘read-line’.  The lowest ones require an
> explicit buffer to be passed in by the caller.  If you modify your
> program to use these, you can control the timing and frequency of
> that buffer's allocation, and thus improve the program's performance.

The problem is that input/output is not the problem. When just adding
this statement:
               (set! found-match (regexp-exec reg-exp this-line))
the program takes 65% more time. So it looks like that rex-exps are very
expensive.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-19 18:16     ` Performance Andy Wingo
@ 2010-06-21  9:48       ` Cecil Westerhof
  2010-06-21 19:34         ` Performance Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Cecil Westerhof @ 2010-06-21  9:48 UTC (permalink / raw)
  To: guile-user

Op zaterdag 19 jun 2010 20:16 CEST schreef Andy Wingo:

>> Op zaterdag 19 jun 2010 11:16 CEST schreef Andy Wingo:
>>
>>> ,profile (call-my-function)
>>
>> (main ("temp/input" "dummy.log" "^ +" "1234567890"))
>
> Almost. At the repl, type:
>
> (load "dummy.scm")
>
> Then:
>
> ,profile (main '("dummy.scm" "temp/input" "dummy.log" "^ +" "1234567890"))

This gives:
    Backtrace:
    In standard input:
       2: 0* (unquote profile)

    standard input:2:2: In expression (unquote profile):
    standard input:2:2: Unbound variable: unquote
    ABORT: (unbound-variable)

> More repl help is available by typing ,help. ,help profile, for example.

Gives:
    Backtrace:
    In standard input:
       1: 0* (unquote help)

    standard input:1:1: In expression (unquote help):
    standard input:1:1: Unbound variable: unquote
    ABORT: (unbound-variable)
and:
    Backtrace:
    In standard input:
       2: 0* (unquote help)

    standard input:2:1: In expression (unquote help):
    standard input:2:1: Unbound variable: unquote
    ABORT: (unbound-variable)
    guile> ERROR: Unbound variable: profile
    ABORT: (unbound-variable)

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-21  9:48       ` Performance Cecil Westerhof
@ 2010-06-21 19:34         ` Andy Wingo
  2010-06-21 20:34           ` Performance Decebal
  2010-06-22 11:33           ` Binary packages of Guile development snapshots? (was Re: Performance) Štěpán Němec
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Wingo @ 2010-06-21 19:34 UTC (permalink / raw)
  To: Cecil Westerhof; +Cc: guile-user

Hello,

On Mon 21 Jun 2010 11:48, Cecil Westerhof <Cecil@decebal.nl> writes:

>     standard input:2:2: In expression (unquote profile):
>     standard input:2:2: Unbound variable: unquote
>     ABORT: (unbound-variable)

Ah, I didn't know you were using Guile 1.8. The Guile 2.0 snapshots are
faster, and they have a more pleasant repl. Use Guile from git :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance
  2010-06-21 19:34         ` Performance Andy Wingo
@ 2010-06-21 20:34           ` Decebal
  2010-06-22 11:33           ` Binary packages of Guile development snapshots? (was Re: Performance) Štěpán Němec
  1 sibling, 0 replies; 11+ messages in thread
From: Decebal @ 2010-06-21 20:34 UTC (permalink / raw)
  To: guile-user

----- Oorspronkelijk bericht -----
> On Mon 21 Jun 2010 11:48, Cecil Westerhof <Cecil@decebal.nl> writes:
> 
> > standard input:2:2: In expression (unquote profile):
> > standard input:2:2: Unbound variable: unquote
> > ABORT: (unbound-variable)
> 
> Ah, I didn't know you were using Guile 1.8. The Guile 2.0 snapshots are
> faster, and they have a more pleasant repl. Use Guile from git :)

I thought that 1.8 was the latest. I'll try 2.0. And a better REPL would be very nice.

-- 
Verzonden vanaf mijn Nokia N900




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Binary packages of Guile development snapshots? (was Re: Performance)
  2010-06-21 19:34         ` Performance Andy Wingo
  2010-06-21 20:34           ` Performance Decebal
@ 2010-06-22 11:33           ` Štěpán Němec
  2010-06-22 19:04             ` Binary packages of Guile development snapshots? Andy Wingo
  1 sibling, 1 reply; 11+ messages in thread
From: Štěpán Němec @ 2010-06-22 11:33 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-user

Andy Wingo <wingo@pobox.com> writes:

> Hello,
>
> On Mon 21 Jun 2010 11:48, Cecil Westerhof <Cecil@decebal.nl> writes:
>
>>     standard input:2:2: In expression (unquote profile):
>>     standard input:2:2: Unbound variable: unquote
>>     ABORT: (unbound-variable)
>
> Ah, I didn't know you were using Guile 1.8. The Guile 2.0 snapshots are
> faster, and they have a more pleasant repl. Use Guile from git :)

On that note, does anybody know if there are any pre-built deb (or other
"binary", for that matter) packages of Guile development snapshots
available? Searching the net doesn't give much hope. I'd like to avoid
building Guile (tried it, wasn't a pleasant experience).


    Štěpán



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Binary packages of Guile development snapshots?
  2010-06-22 11:33           ` Binary packages of Guile development snapshots? (was Re: Performance) Štěpán Němec
@ 2010-06-22 19:04             ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2010-06-22 19:04 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: guile-user, Rob Browning

On Tue 22 Jun 2010 13:33, Štěpán Němec <stepnem@gmail.com> writes:

> Does anybody know if there are any pre-built deb (or other "binary",
> for that matter) packages of Guile development snapshots available?
> Searching the net doesn't give much hope. I'd like to avoid building
> Guile (tried it, wasn't a pleasant experience).

Not that I am aware of. I have copied Rob, the Debian package
maintainer; hopefully he or someone else in Debian will find time to do
this.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-22 19:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-18 20:50 Performance Cecil Westerhof
2010-06-19  9:16 ` Performance Andy Wingo
2010-06-19 15:05   ` Performance Cecil Westerhof
2010-06-19 15:44     ` Performance Thien-Thi Nguyen
2010-06-21  9:41       ` Performance Cecil Westerhof
2010-06-19 18:16     ` Performance Andy Wingo
2010-06-21  9:48       ` Performance Cecil Westerhof
2010-06-21 19:34         ` Performance Andy Wingo
2010-06-21 20:34           ` Performance Decebal
2010-06-22 11:33           ` Binary packages of Guile development snapshots? (was Re: Performance) Štěpán Němec
2010-06-22 19:04             ` Binary packages of Guile development snapshots? Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).