unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* speed difference between Guile and Racket (and Python)
@ 2022-11-06 16:01 Damien Mattei
  2022-11-06 22:23 ` Speed " Hans Åberg
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Damien Mattei @ 2022-11-06 16:01 UTC (permalink / raw)
  To: guile-user

hello,

when comparing the (almost) same code running on Guile and Racket i find
big speed difference:
still computing some logic expression Cn minimalized in disjunctive normal
form:

C9: 35" Guile MacOS Apple silicon

C10: 37' Guile MacOS Apple silicon
             10" Guile // MacOS (8cores) Apple silicon
                4" Racket // MacOS Apple silicon

C11 : 1'17 Guile // MacOS Apple silicon
C11: 56" Guile // (6cores) Intel, Linux
11" Racket // MacOS Apple silicon
22" Python sympy no // MacOS Apple silicon

C12: 1'24" Racket // MacOS Apple silicon
1'34 Racket MacOS Apple silicon
1'10" Python sympy no // MacOS Apple silicon
9' 25" Guile // MacOS Apple silicon

C13: 17' ,20', 24'(use <8Gb of memory) Racket MacOS Apple silicon
15' 37",16' 10" Racket // MacOS Apple silicon
7'50" Python sympy no // MacOS Apple silicon

par-map:
test : succeed
computation: very slow

threads:
test: blocked
computation:partial and crash

my conclusion about // is that in Guile and Racket my // schema is not
good, i have poor gain.

The strange thing was why in Guile i had :
C10: 37' Guile MacOS Apple silicon
             10" Guile // MacOS (8cores) Apple silicon
37' in sequential code and 10" in // with only 8 core speed up: because in
// code i use vectors and in sequential code list i think.

So now the question is why is Guile slow compared to Racket? is it again
about the lists like versus vectors? or not?

compared with Python sympy (no // support) it has the same magnitude order
than Racket (// almost change nothing:16' versus 17' for C13) but twice
more speed... but Python is known to be slow ( not compiled code)...

i'm running Guile in the interpreter (same for Racket), would it be more
fast in command line execution? sorry if my question is stupid, i know
Bigloo can compile rather being in interpreter,but do not know about guile
, each time i modify my code it seems to be compiled... (message: ;;;
compiling......)

last version of code is here:
https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3092

Best regards,
Damien


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Speed difference between Guile and Racket (and Python)
  2022-11-06 16:01 speed difference between Guile and Racket (and Python) Damien Mattei
@ 2022-11-06 22:23 ` Hans Åberg
  2022-11-06 22:35   ` Damien Mattei
  2022-11-07  9:21 ` speed " Dr. Arne Babenhauserheide
  2022-11-08  8:03 ` Linus Björnstam
  2 siblings, 1 reply; 7+ messages in thread
From: Hans Åberg @ 2022-11-06 22:23 UTC (permalink / raw)
  To: Damien Mattei; +Cc: guile-user


> On 6 Nov 2022, at 17:01, Damien Mattei <damien.mattei@gmail.com> wrote:
> 
> So now the question is why is Guile slow compared to Racket?

How is thread performance? —The Boehm GC puts locks around every memory allocation, which is slow if heavy in use.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Speed difference between Guile and Racket (and Python)
  2022-11-06 22:23 ` Speed " Hans Åberg
@ 2022-11-06 22:35   ` Damien Mattei
  0 siblings, 0 replies; 7+ messages in thread
From: Damien Mattei @ 2022-11-06 22:35 UTC (permalink / raw)
  To: guile-user

at some point, threads block or even crash but the sequential run is slow
too compared to Racket one.


On Sun, Nov 6, 2022 at 11:23 PM Hans Åberg <haberg-1@telia.com> wrote:

>
> > On 6 Nov 2022, at 17:01, Damien Mattei <damien.mattei@gmail.com> wrote:
> >
> > So now the question is why is Guile slow compared to Racket?
>
> How is thread performance? —The Boehm GC puts locks around every memory
> allocation, which is slow if heavy in use.
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: speed difference between Guile and Racket (and Python)
  2022-11-06 16:01 speed difference between Guile and Racket (and Python) Damien Mattei
  2022-11-06 22:23 ` Speed " Hans Åberg
@ 2022-11-07  9:21 ` Dr. Arne Babenhauserheide
  2022-11-07 13:23   ` Zelphir Kaltstahl
  2022-11-09 14:24   ` Damien Mattei
  2022-11-08  8:03 ` Linus Björnstam
  2 siblings, 2 replies; 7+ messages in thread
From: Dr. Arne Babenhauserheide @ 2022-11-07  9:21 UTC (permalink / raw)
  To: Damien Mattei; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]


Damien Mattei <damien.mattei@gmail.com> writes:

> when comparing the (almost) same code running on Guile and Racket i find
> big speed difference:

Schemes differ a lot in speed of different tasks, but Racket is one of
the fastest ones. Factor 2 difference sounds plausible. For a
comparison, see the r7rs benchmarks:
https://ecraven.github.io/r7rs-benchmarks/

That said, I have seen 10x speedups in Guile code when people went for
optimizing it.

> last version of code is here:
> https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3092

Could you give the shell commands to setup and run your speed-test?

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: speed difference between Guile and Racket (and Python)
  2022-11-07  9:21 ` speed " Dr. Arne Babenhauserheide
@ 2022-11-07 13:23   ` Zelphir Kaltstahl
  2022-11-09 14:24   ` Damien Mattei
  1 sibling, 0 replies; 7+ messages in thread
From: Zelphir Kaltstahl @ 2022-11-07 13:23 UTC (permalink / raw)
  To: Dr. Arne Babenhauserheide, Damien Mattei; +Cc: guile-user

Hi!

I think the only way to use multiple cores in Racket is to use "places" and that 
means starting new Racket VMs.

Lambdas are not easily serialized with all their environment, so it is difficult 
to actually "send a lambda" to another "place" (Racket VM) dynamically. The only 
way I found was to not send them at all, but predefine functions, so that they 
are already known in other places and do not need to be send. Instead send only 
data.

This makes it difficult to build a pool for things (threads, workers, etc.), 
that run on multiple cores and dynamically accept "work" to do. (– Has anyone 
done it? With dynamically calculated work being sent to the threads/workers? 
I've not seen an example.) This was unsatisfactory for me for implementing a 
decision tree algorithm, which is, why I originally took a look at Guile.

So I think as a consequence there is a lot of overhead when using multiple cores 
in Racket. At least initially, for starting Racket VMs (places).

(Correct me, if I am wrong about these things. It is only my experience from 
using Racket before using Guile, and I learned a lot since then.)

I would expect using futures on Guile to have much less overhead, than starting 
Racket places and fibers to have even less overhead than futures. But this is 
only guessing. From that I would expect multi-core with many separate tasks to 
run, to be faster on Guile. But again, merely guessing around. If sequential 
execution is faster in Racket, there is probably some point, at which 
parallelizing outweighs the benefits of sequentially being faster.

Regards,
Zelphir

On 11/7/22 10:21, Dr. Arne Babenhauserheide wrote:
> Damien Mattei <damien.mattei@gmail.com> writes:
>
>> when comparing the (almost) same code running on Guile and Racket i find
>> big speed difference:
> Schemes differ a lot in speed of different tasks, but Racket is one of
> the fastest ones. Factor 2 difference sounds plausible. For a
> comparison, see the r7rs benchmarks:
> https://ecraven.github.io/r7rs-benchmarks/
>
> That said, I have seen 10x speedups in Guile code when people went for
> optimizing it.
>
>> last version of code is here:
>> https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3092
> Could you give the shell commands to setup and run your speed-test?
>
> Best wishes,
> Arne

-- 
repositories: https://notabug.org/ZelphirKaltstahl




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: speed difference between Guile and Racket (and Python)
  2022-11-06 16:01 speed difference between Guile and Racket (and Python) Damien Mattei
  2022-11-06 22:23 ` Speed " Hans Åberg
  2022-11-07  9:21 ` speed " Dr. Arne Babenhauserheide
@ 2022-11-08  8:03 ` Linus Björnstam
  2 siblings, 0 replies; 7+ messages in thread
From: Linus Björnstam @ 2022-11-08  8:03 UTC (permalink / raw)
  To: Damien Mattei, guile-user

Hi!

First of all, Guile is (currently) slower than racket or many things. The interpreter should not be slower than running from the command line. 

One thing you could do is modularize the code. Currently there will be a function call overhead, since guile cannot know if a function has been replaced via set!. Andy wrote about it here: http://wingolog.org/archives/2019/06/26/fibs-lies-and-benchmarks

Secondly, your code uses quite a lot of mutation. I got scheme+ and macroexpanded it. I didn't look for hot code, but there was a liberal use of set!. That means you will get a boxing overhead (which should be true for both racket and guile), but since I believe I read somewhere that racket is better at type-inference than mainline chez I think it is safe to say it probably does better than guile with mutation, at least locally. Most notably, the for loops are definitely slower than the same tail-recursive let loop.

Also, for seems to expand to use call/cc instead of delimited continuations (which are supported in both guile and racket). The punishment in racket for call/cc is low, whereas it is high in guile. 


Best regards
  Linus Björnstam

On Sun, 6 Nov 2022, at 17:01, Damien Mattei wrote:
> hello,
>
> when comparing the (almost) same code running on Guile and Racket i find
> big speed difference:
> still computing some logic expression Cn minimalized in disjunctive normal
> form:
>
> C9: 35" Guile MacOS Apple silicon
>
> C10: 37' Guile MacOS Apple silicon
>              10" Guile // MacOS (8cores) Apple silicon
>                 4" Racket // MacOS Apple silicon
>
> C11 : 1'17 Guile // MacOS Apple silicon
> C11: 56" Guile // (6cores) Intel, Linux
> 11" Racket // MacOS Apple silicon
> 22" Python sympy no // MacOS Apple silicon
>
> C12: 1'24" Racket // MacOS Apple silicon
> 1'34 Racket MacOS Apple silicon
> 1'10" Python sympy no // MacOS Apple silicon
> 9' 25" Guile // MacOS Apple silicon
>
> C13: 17' ,20', 24'(use <8Gb of memory) Racket MacOS Apple silicon
> 15' 37",16' 10" Racket // MacOS Apple silicon
> 7'50" Python sympy no // MacOS Apple silicon
>
> par-map:
> test : succeed
> computation: very slow
>
> threads:
> test: blocked
> computation:partial and crash
>
> my conclusion about // is that in Guile and Racket my // schema is not
> good, i have poor gain.
>
> The strange thing was why in Guile i had :
> C10: 37' Guile MacOS Apple silicon
>              10" Guile // MacOS (8cores) Apple silicon
> 37' in sequential code and 10" in // with only 8 core speed up: because in
> // code i use vectors and in sequential code list i think.
>
> So now the question is why is Guile slow compared to Racket? is it again
> about the lists like versus vectors? or not?
>
> compared with Python sympy (no // support) it has the same magnitude order
> than Racket (// almost change nothing:16' versus 17' for C13) but twice
> more speed... but Python is known to be slow ( not compiled code)...
>
> i'm running Guile in the interpreter (same for Racket), would it be more
> fast in command line execution? sorry if my question is stupid, i know
> Bigloo can compile rather being in interpreter,but do not know about guile
> , each time i modify my code it seems to be compiled... (message: ;;;
> compiling......)
>
> last version of code is here:
> https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3092
>
> Best regards,
> Damien



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: speed difference between Guile and Racket (and Python)
  2022-11-07  9:21 ` speed " Dr. Arne Babenhauserheide
  2022-11-07 13:23   ` Zelphir Kaltstahl
@ 2022-11-09 14:24   ` Damien Mattei
  1 sibling, 0 replies; 7+ messages in thread
From: Damien Mattei @ 2022-11-09 14:24 UTC (permalink / raw)
  To: Dr. Arne Babenhauserheide; +Cc: guile-user

sorry for my late answer but i needed to test again (and modify code a bit)
a few precision:
-the code is perheaps hard to // : recursive algo versus imperative one
-only a portion of code is //( i // the unification of minterms because it
was really long to compute,but perheaps there are other bottlenecks,
perhaps some cartesian product of set need too // but i did not do it,all
algo are not easily parallelizable)
-i no more use thread or par-map but future with vectors
-the sequential code remains with lists

As it seems that // only speed up code because i use Vectors in the portion
of // code instead of list, i decided to compare Vectors and List, for this
i use the // code with options of a number of CPUs equal to 1, so it is
like the sequential code but still use Vectors and i can compare with the
sequential code of Lists

-Guile is 6x slower than Racket on this algo
-Racket is 2x slower than Python sympy but i do not know if the algo of
symPy is the same as mine, but it is written in Python (not C) witch has a
reputation of slowness and symPy has no support for //.
-I discover one thing about Racket because i benchmarks using version 7.7
and 8.6 but the last version is based on Chez Scheme which is more than 5x
faster,i upgrade my Linux system to this version today ,so i can no longer
test Racket 7.7 speed but it still appear on the bechmarks below.
-i did minor modif of code (remove useless sorting of variable and
expressions) this week improving speed by 10-20% on all system so some
older benchmarks can behavior incoherent with new ones.
-strange thing, Python use almost no memory and my algo gigas... but i use
Quine - Mc Cluskey and Petrick algorithms which are conventional algo for
this problem.
- @Linus:about Scheme+, mutability, i use it also with Racket and it  is
fast, 'for loops are various ,some old: for/basic for/break some recent but
all use (let () loop inside sometimes with call/cc i admit,but only when
usefull for 'breaking from loop.Indeed i read the link you provided and it
was very interesting.

here is the benchmarks, i used 2 systems:
Guile 3.0.7 ,Racket 8.6 , Python 3.9.6 SymPy 1.1 ,MacOS Apple silicon M1
(8cores), RAM: 16Gb, HD:SSD
Guile 3.0.1 ,Racket 7.7 ,Racket 8.6 (based on Chez Scheme !) , Python 3.7.4
, SymPyLinux Intel (6cores), RAM: 16Gb, HD:SSD

C9: 35" Guile MacOS Apple silicon M1
       50",40" (without useless sorting of sets) Guile Linux intel
       26" Racket 7.7 Linux  Intel

C10: 37' Guile MacOS Apple silicon M1
         47' Guile Linux intel
             10" Guile // MacOS (8cores) Apple silicon M1
               9" Guile // (1 core ! vector) Linux intel
                4" Racket 8.6 // MacOS Apple silicon M1
               7" Guile // (3 core  vector) MacOS Apple silicon M1
              43' Racket 7.7 Linux  Intel



C11 : 1'17 Guile // MacOS Apple silicon M1
          56" Guile // (1 core ! vector) Linux intel
1' Guile // (1 core ! vector) MacOS Apple silicon
           1' 04"Guile // (3 core  vector) MacOS Apple silicon
C11: 56" Guile // (6cores) Intel, Linux
1'08" (1 core ! vector) Linux intel
9 " Racket 8.6 MacOS Apple silicon
11" Racket 8.6 // MacOS Apple silicon
22" Python 3.9.6 sympy 1.1 no // MacOS Apple silicon
17" Python Linux intel
15" Racket 8.6 Linux  Intel

C12: 1'24" ,1' 20" Racket 8.6 // MacOS Apple silicon
1'34 ,1' 27" Racket 8.6 MacOS Apple silicon
1' 23" Racket 8.6 // (1 core !) MacOS Apple silicon
1'10" Python sympy no // MacOS Apple silicon
9' 25" Guile // MacOS Apple silicon
9',9'45" Guile (1 core ! vector) Linux intel
8' 37" Guile // (3 core  vector) MacOS Apple silicon
10' 10" Guile // (7 core  vector) MacOS Apple silicon
8'33" Guile // (1 core ! vector) MacOS Apple silicon
2'10" Python sympy no // Linux intel
2' 05" Racket 8.6 Linux  Intel

C13: 17' ,20', 24'(use <8Gb of memory) Racket MacOS Apple silicon
15' 37",16' 10" ,14' (last version of code) Racket // MacOS Apple silicon
14' Racket 8.6 // (1 core !) MacOS Apple silicon
7'50" Python 3.9.6 sympy 1.1 no // MacOS Apple silicon
1h37' Guile (1 core ! vector) Linux intel
17'20" Python sympy no // Linux Intel
16'35" Racket 8.6 (1 core !) Linux  Intel

but this not really important, i have enough data and computation result
,after for k > 13 computation of Ck would take many days even in Python as
it is an NP hard problem.

But i will check if other region of my code are better candidate to //
,because for now // brings almost nothing (except that with Guile the //
region is more fast because Vectors are more fast than List with Guile)

My conclusion for now that Racket 8.6 (confirm to be based on Chez Scheme
?) is a bit faster than Python :-)
 and they are 6x time faster than Guile 3 on this algo . Note that my algo
use more than 8Gb  of data for C13 when python use almost no memory (i
suppose symPy use only binary datas computed the dumb way when my algo
manipulate expressions symbolically....i suppose too that my unified
minterms made of 0 ,1 and 'x in list can be representated with base 3
numbers on 2bits but with a limitation of 64 /2 = 32 bits of variables
which is anyway never computable...NP-hard),i suppose changing my algo to
deal like symPy one would kick the ass of Python definitively both with
Racket and Guile by removing a lot of memory allocation and garbage
collector use.

i will check again the data structure of my code, i also will try to to
publish the full code i use for benchmarks.

Damien

post-note: i'm checking again because the code i wrote use call/cc generate
by macro in 'future and i'm not sure it is compatible with  ,when i test it
i find no speed up and i can not find what is wrong, i''m almost back to
the beginning of problem...

On Mon, Nov 7, 2022 at 10:28 AM Dr. Arne Babenhauserheide <arne_bab@web.de>
wrote:

>
> Damien Mattei <damien.mattei@gmail.com> writes:
>
> > when comparing the (almost) same code running on Guile and Racket i find
> > big speed difference:
>
> Schemes differ a lot in speed of different tasks, but Racket is one of
> the fastest ones. Factor 2 difference sounds plausible. For a
> comparison, see the r7rs benchmarks:
> https://ecraven.github.io/r7rs-benchmarks/
>
> That said, I have seen 10x speedups in Guile code when people went for
> optimizing it.
>
> > last version of code is here:
> >
> https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3092
>
> Could you give the shell commands to setup and run your speed-test?
>
> Best wishes,
> Arne
> --
> Unpolitisch sein
> heißt politisch sein,
> ohne es zu merken.
> draketo.de
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-09 14:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-06 16:01 speed difference between Guile and Racket (and Python) Damien Mattei
2022-11-06 22:23 ` Speed " Hans Åberg
2022-11-06 22:35   ` Damien Mattei
2022-11-07  9:21 ` speed " Dr. Arne Babenhauserheide
2022-11-07 13:23   ` Zelphir Kaltstahl
2022-11-09 14:24   ` Damien Mattei
2022-11-08  8:03 ` Linus Björnstam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).