unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* wip-ports-refactor
@ 2016-04-06 20:46 Andy Wingo
  2016-04-07  4:16 ` wip-ports-refactor Christopher Allan Webber
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Andy Wingo @ 2016-04-06 20:46 UTC (permalink / raw)
  To: guile-devel

Hi,

I have been working on a refactor to ports.  The goal is to have a
better concurrency story.  Let me tell that story then get down to the
details.

So, right now Guile has a pretty poor concurrency story.  We just have
pthreads, which is great in many ways, but nobody feels like
recommending this to users.  The reason is that when pthreads were
originally added to Guile, they were done in such a way that we could
assume that data races would just be OK.  It's amazing to reflect upon
this, but that's how it is.  Many internal parts of Guile are vulnerable
to corruption when run under multiple kernel threads in parallel.
Consider what happens when you try to load a module from two threads at
the same time.  What happens?  What should happen?  Should it be
possible to load two modules in parallel?  The system hasn't really been
designed as a whole.  Guile has no memory model, as such.  We have
patches over various issues, ad-hoc locks, but it's not in a state where
we can recommend that users seriously use threads.

That said, I've used threads and had good results :) And Guix uses them
in places though they had to avoid loading modules from threads.  One of
the areas I patched over in 2.2 was ports: I just added a lock to almost
every port.  But that slowed things down especially on non-x86, and it
would be nice to find a better solution.

Anyway, one part of fixing Guile's concurrency story is to be able to
tell users, "yes, just use kernel threads".  That would be nice.  It
will take some design but it's possible I guess.

However!  Concurrency doesn't necessarily mean parallelism.  By that I
mean, it's possible to have concurrent requests to a web server, for
example, but without involving kernel threads.  But with Guile we have
no story here.  Racket has something it calls "threads", which are in
user-space; Go has its goroutines; Node has a whole story around
callbacks and the main loop; but for Guile there is not much.
Guile-GNOME does this in a way, but not very well, and not in a
nice-to-program way.  More appropriate is 8sync, a new project by Chris
Webber that is designed to be a kind of user-space threading library for
Guile.

I did give a try at prototyping such a thing a long time ago,
"ethreads".  Ethreads are user-space threads, which are really delimited
continuations with a scheduler.  If the thread -- the dynamic extent of
a program that runs within a prompt -- if the thread would block on I/O,
it suspends itself, returning to the scheduler, and then the scheduler
resumes the thread when I/O can continue.  There's an epoll loop
underneath.

That hack seemed to work; I even got the web server working on it, and
ran my web site on it for a while.  The problem was, though, that it
completely bypassed ports.  It made its own port types and buffers and
everything.  That's not really Guile -- that's a library.

                            *  *  *

Which brings us to the port refactor.  Ultimately I see ports as all
having buffers.  These buffers can be accessed from Scheme.  Normal I/O
goes to the buffer first.  When the buffers need filling or emptying,
Scheme code can call Scheme code to do that.  There could be Scheme
dynamic parameters defining whether filling/emptying blocks -- if it
doesn't block, then if the read would block it could call out to some
context to suspend the thread.  Since it's all Scheme code, that
continuation can be resumed as well -- the delimited continuation does
not capture a trampoline through C.  The buffer itself is represented as
a bytevector with a couple of cursors, which gives us some basic
threadsafety without locks on the Scheme side -- Scheme always checks
that accesses are within bounds.

But, currently in Guile 2.0 and in master, buffering is handled by the
port implementation.  That means that there is no buffer to expose to
Scheme, and no real path towards implementing efficient I/O operators
that need to grovel in a buffer from Scheme.  It also means that there's
no easy solution for non-blocking I/O, AFAIU.

The wip-port-refactor branch is a step towards centralizing buffering
management within the generic ports code.  It thins the interface to
port implementations, instead assuming that the read/write functions are
to unbuffered mutable stores, as Guile is the part handling the
buffering.  I've documented what I can in the branch.

The commits before the HEAD are fairly trivial I think; it's the last
one that's a doozy.  It doesn't yet remove locks; there's still a lot of
locks, and it's hard to know what we can do without locks given the
leeway give to modern C compilers.  But it's a step.

Going forward we need to define a Scheme data type for ports, and to
allow the read/write procedures to be called from Scheme, and to allow
Scheme implementaitons of those procedures.  We also need to figure out
how to do non-blocking I/O, both on files and non-files; should we set
all our FD's to O_NONBLOCK?  How does it affect our internal
interfaces?  I do not know yet.

There's still space for different schedulers.  I wouldn't want to
include a scheduler and a thread concept in Guile 2.2.0 I don't think --
but if we can build it in such a way that it seems natural, on top of
ports, then it sounds like a good idea.

Review welcome, especially from Mark, Ludovic, and Chris.

Cheers,

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-06 20:46 wip-ports-refactor Andy Wingo
@ 2016-04-07  4:16 ` Christopher Allan Webber
  2016-04-12  8:52   ` wip-ports-refactor Andy Wingo
  2016-04-12  9:33 ` wip-ports-refactor Andy Wingo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Christopher Allan Webber @ 2016-04-07  4:16 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo writes:

> Hi,
>
> I have been working on a refactor to ports.  The goal is to have a
> better concurrency story.  Let me tell that story then get down to the
> details.

Hoo, what an email!  I need to read your code before I can do a full
commentary.  But...

> More appropriate is 8sync, a new project by Chris Webber that is
> designed to be a kind of user-space threading library for Guile.

Hey, thanks!  I hope it's on the right track.

> I did give a try at prototyping such a thing a long time ago,
> "ethreads".  Ethreads are user-space threads, which are really delimited
> continuations with a scheduler.  If the thread -- the dynamic extent of
> a program that runs within a prompt -- if the thread would block on I/O,
> it suspends itself, returning to the scheduler, and then the scheduler
> resumes the thread when I/O can continue.  There's an epoll loop
> underneath.
>
> That hack seemed to work; I even got the web server working on it, and
> ran my web site on it for a while.  The problem was, though, that it
> completely bypassed ports.  It made its own port types and buffers and
> everything.  That's not really Guile -- that's a library.
>
>                             *  *  *
>
> Which brings us to the port refactor.  Ultimately I see ports as all
> having buffers.  These buffers can be accessed from Scheme.  Normal I/O
> goes to the buffer first.  When the buffers need filling or emptying,
> Scheme code can call Scheme code to do that.  There could be Scheme
> dynamic parameters defining whether filling/emptying blocks -- if it
> doesn't block, then if the read would block it could call out to some
> context to suspend the thread.  Since it's all Scheme code, that
> continuation can be resumed as well -- the delimited continuation does
> not capture a trampoline through C.  The buffer itself is represented as
> a bytevector with a couple of cursors, which gives us some basic
> threadsafety without locks on the Scheme side -- Scheme always checks
> that accesses are within bounds.
>
> But, currently in Guile 2.0 and in master, buffering is handled by the
> port implementation.  That means that there is no buffer to expose to
> Scheme, and no real path towards implementing efficient I/O operators
> that need to grovel in a buffer from Scheme.  It also means that there's
> no easy solution for non-blocking I/O, AFAIU.
>
> The wip-port-refactor branch is a step towards centralizing buffering
> management within the generic ports code.  It thins the interface to
> port implementations, instead assuming that the read/write functions are
> to unbuffered mutable stores, as Guile is the part handling the
> buffering.  I've documented what I can in the branch.

So, does this branch replace ethreads, or compliment it?  Where should I
be focusing my (currently limited) review / integration attempt energy?
I've been hoping to review ethreads this week but now I'm unsure.  Can
you explain how the efforts currently relate?

> The commits before the HEAD are fairly trivial I think; it's the last
> one that's a doozy.  It doesn't yet remove locks; there's still a lot of
> locks, and it's hard to know what we can do without locks given the
> leeway give to modern C compilers.  But it's a step.
>
> Going forward we need to define a Scheme data type for ports, and to
> allow the read/write procedures to be called from Scheme, and to allow
> Scheme implementaitons of those procedures.  We also need to figure out
> how to do non-blocking I/O, both on files and non-files; should we set
> all our FD's to O_NONBLOCK?  How does it affect our internal
> interfaces?  I do not know yet.

One other question is if this will help in the "no nice way to do custom
binary ports" stuff that was blocking the
tls-enabled-ports-in-guile-proper thing...

> There's still space for different schedulers.  I wouldn't want to
> include a scheduler and a thread concept in Guile 2.2.0 I don't think --
> but if we can build it in such a way that it seems natural, on top of
> ports, then it sounds like a good idea.

As I've said, I'm not tied to 8sync specifically if doing something more
internally makes more sense.  (Even if I have a nice site and logo
coming together now ;))

Exciting times!
 - Chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-07  4:16 ` wip-ports-refactor Christopher Allan Webber
@ 2016-04-12  8:52   ` Andy Wingo
  2016-04-13 14:27     ` wip-ports-refactor Christopher Allan Webber
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Wingo @ 2016-04-12  8:52 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: guile-devel

Hi!

Summarizing my reply over IRC:

On Thu 07 Apr 2016 06:16, Christopher Allan Webber <cwebber@dustycloud.org> writes:

> So, does this branch replace ethreads, or compliment it?  Where should I
> be focusing my (currently limited) review / integration attempt energy?
> I've been hoping to review ethreads this week but now I'm unsure.  Can
> you explain how the efforts currently relate?

This branch hopes to make the "eports" part of that branch unnecessary.
However actually implementing user-space threads à la ethreads is out of
scope, as is the epoll wrapper.

> One other question is if this will help in the "no nice way to do custom
> binary ports" stuff that was blocking the
> tls-enabled-ports-in-guile-proper thing...

Was that the blocker?  Anyway the current branch's ports are verrrrrrrry
close to R6RS binary ports, so this shouldn't be a difficulty any
more.  I haven't implemented custom binary I/O ports (we have input-only
and output-only but not both) yet, but it should be doable.

> As I've said, I'm not tied to 8sync specifically if doing something more
> internally makes more sense.  (Even if I have a nice site and logo
> coming together now ;))

I think keep rolling with 8sync :)  It has a nice brand, it's filling a
need that probably won't be filled in 2.2.0, it's laying groundwork for
future Guile features.  Eventually I would like user-space threads in
Guile proper, implemented in terms of delimited continuations, and that
implies a scheduler too.  But that's a bit far off.  My goal is to make
it possible to add such a thing during the 2.2.x series, probably first
as a library (8sync) and eventually as a core Guile feature.

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-06 20:46 wip-ports-refactor Andy Wingo
  2016-04-07  4:16 ` wip-ports-refactor Christopher Allan Webber
@ 2016-04-12  9:33 ` Andy Wingo
  2016-04-14 14:03 ` wip-ports-refactor Ludovic Courtès
  2016-04-24 11:05 ` wip-ports-refactor Chris Vine
  3 siblings, 0 replies; 19+ messages in thread
From: Andy Wingo @ 2016-04-12  9:33 UTC (permalink / raw)
  To: guile-devel

On Wed 06 Apr 2016 22:46, Andy Wingo <wingo@pobox.com> writes:

> I have been working on a refactor to ports.

The status is that in wip-ports-refactor I have changed the internal
port implementation to always have buffers, and that those buffers are
always bytevectors (internally encapsulated in a scm_t_port_buffer
struct that has cursors into the buffer).  In that way we should be able
to access the port buffers from Scheme safely.

The end-game is to allow programs like Scheme's `read' to be
suspendable.  That means, whenever they would peek-char/read-char and
input is unavailable, the program would suspend to the scheduler by
aborting to a prompt, and resume the resulting continuation when input
becomes available.  Likewise for writing.  To do this, all port
functions need to be implemented in Scheme, because for a delimited
continuation to be resumed, it has to only capture Scheme activations,
not C activations.

This is obviously a gnarly task.  It still makes sense to have C
functions that work on ports -- and specifically, that C have access to
the port buffers.  But it would be fine for C ports to call out to
Scheme to fill their read buffers / flush their write buffers.

So the near-term is to move the read/write/etc ptob methods to be Scheme
functions -- probably gsubr wrappers for now (for the existing port
types).  Then we need to start allowing I/O functions to be implemented
in Scheme -- in (ice-9 ports) or so.

But, you don't want Scheme code to have to import (ice-9 ports).  You
want existing code that uses read-char and so on to become suspendable.
So, we will replace core I/O bindings in boot-9 with imported bindings
from (ice-9 ports).  That will also allow us to trim the set of bindings
defined in boot-9 itself (before (ice-9 ports) is loaded) to the minimum
set that is needed to boot Guile.

So the plan is:

  1. Create (ice-9 ports) module

     - it will do load-extension to cause ports.c to define I/O routines

     - it exports all i/o routines that are exported by ports.c, and
       perhaps by other files as well

     - bindings from (ice-9 ports) are imported into boot-9, augmenting
       the minimal set of bindings defined in boot-9, and replacing the
       existing minimal bindings via set!

  2. Add Scheme interface to port buffers, make internal to (ice-9
     ports)

     - this should allow I/O routines to get a port's read or write
       buffers, grovel in the bytes, update cursors, and call the read
       or write functions to fill or empty them

  3. Start rewriting I/O routines in Scheme

  4. Add/adapt a non-blocking interface

     - Currently port read/write functions are blocking.  Probably we
       should change their semantics to be nonblocking.  This would
       allow Guile to detect when to suspend a computation.

     - Nonblocking ports need an FD to select on; if they don't have
       one, a write or read that consumes 0 bytes indicates EOF

     - Existing blocking interfaces would be shimmed by "select"-ing on
       the port until it's writable in a loop

  5. Add "current read waiter" / "current write waiter" abstraction from
     the ethreads branch

     - These are parameters (dynamic bindings) that are procedures that
       define what to do when a read or write would block.  By default I
       think probably they should select in a loop to emulate blocking
       behavior.  They could be parameterized to suspend the computation
       to a scheduler though.

Finally there is a question about speed.  I expect that for buffered
ports, I/O from C will have a minimal slowdown.  For unbuffered ports,
the slowdown would be more, because the cost of filling and emptying
ports is higher with a call from C to Scheme (and then back, for
read/write functions actually implemented in C.)  But for Scheme, I
expect that generally throughput goes up, as we will be able to build
flexible I/O routines that can access the buffer directly, both because
with this branch buffering is uniformly handled in the generic port
code, and also because Scheme avoids the Scheme->C penalty in common
cases.  We can provide compiler support for accessing the port buffer,
if needed, but hopefully we can avoid that.

Finally finally, there is still the question about locks.  I don't know
the answer here.  I think it's likely that we can have concurrent access
to port buffers without locks, but I suspect that anything that accesses
mutable port state should probably be protected by a lock -- but
probably not a re-entrant lock, because the operations called with that
lock wouldn't call out to any user code.  That means that read/write
functions from port implementations would have to bake in their own
threadsafety, but probably that's OK; for file ports, for example, the
threadsafety is baked in the kernel.  Atomic accessors are also a
possibility if there is still overhead.  I think also we could remove
all of the _unlocked functions from our API and from our internals in
that case, and just lock as appropriate, understanding that the perf
impact should be minimal.

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-12  8:52   ` wip-ports-refactor Andy Wingo
@ 2016-04-13 14:27     ` Christopher Allan Webber
  0 siblings, 0 replies; 19+ messages in thread
From: Christopher Allan Webber @ 2016-04-13 14:27 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo writes:

> Hi!
>
> Summarizing my reply over IRC:
>
> On Thu 07 Apr 2016 06:16, Christopher Allan Webber <cwebber@dustycloud.org> writes:
>
>> So, does this branch replace ethreads, or compliment it?  Where should I
>> be focusing my (currently limited) review / integration attempt energy?
>> I've been hoping to review ethreads this week but now I'm unsure.  Can
>> you explain how the efforts currently relate?
>
> This branch hopes to make the "eports" part of that branch unnecessary.
> However actually implementing user-space threads à la ethreads is out of
> scope, as is the epoll wrapper.
>
>> One other question is if this will help in the "no nice way to do custom
>> binary ports" stuff that was blocking the
>> tls-enabled-ports-in-guile-proper thing...
>
> Was that the blocker?

Ludovic explains here:
  https://lists.gnu.org/archive/html/guile-devel/2015-09/msg00042.html

> Anyway the current branch's ports are verrrrrrrry close to R6RS binary
> ports, so this shouldn't be a difficulty any more.  I haven't
> implemented custom binary I/O ports (we have input-only and
> output-only but not both) yet, but it should be doable.

Yay!  It does sound like that might fix it.  (Admittedly it's a bit
beyond me.)

>> As I've said, I'm not tied to 8sync specifically if doing something more
>> internally makes more sense.  (Even if I have a nice site and logo
>> coming together now ;))
>
> I think keep rolling with 8sync :)  It has a nice brand, it's filling a
> need that probably won't be filled in 2.2.0, it's laying groundwork for
> future Guile features.

Yay!  Ok, I will do so.

> Eventually I would like user-space threads in Guile proper,
> implemented in terms of delimited continuations, and that implies a
> scheduler too.  But that's a bit far off.  My goal is to make it
> possible to add such a thing during the 2.2.x series, probably first
> as a library (8sync) and eventually as a core Guile feature.
>
> Andy

Sounds good.  But here's a question: we haven't accepted many
contributions yet to 8sync.  Would it make sense now to require
copyright assignment for the project?  (We don't, yet.)  That might slow
development a bit but could make future merging into Guile, if such a
thing were to be done, easier.

 - Chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-06 20:46 wip-ports-refactor Andy Wingo
  2016-04-07  4:16 ` wip-ports-refactor Christopher Allan Webber
  2016-04-12  9:33 ` wip-ports-refactor Andy Wingo
@ 2016-04-14 14:03 ` Ludovic Courtès
  2016-04-17  8:49   ` wip-ports-refactor Andy Wingo
  2016-04-24 11:05 ` wip-ports-refactor Chris Vine
  3 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2016-04-14 14:03 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> I have been working on a refactor to ports.  The goal is to have a
> better concurrency story.  Let me tell that story then get down to the
> details.

In addition to concurrency and thread-safety, I’m very much interested
in the impact of this change on the API (I’ve always found the port API
in C to be really bad), on the flexibility it would provide, and on
performance—‘read-char’ and ‘get-u8’ are currently prohibitively slow!

> Going forward we need to define a Scheme data type for ports, and to
> allow the read/write procedures to be called from Scheme, and to allow
> Scheme implementaitons of those procedures.  We also need to figure out
> how to do non-blocking I/O, both on files and non-files; should we set
> all our FD's to O_NONBLOCK?  How does it affect our internal
> interfaces?  I do not know yet.

I think this part can come later, after the refactoring is done.

> There's still space for different schedulers.  I wouldn't want to
> include a scheduler and a thread concept in Guile 2.2.0 I don't think --
> but if we can build it in such a way that it seems natural, on top of
> ports, then it sounds like a good idea.

I agree.  If the new implementation gives users more flexibility, then
people will be able to easily experiment with things like 8sync or your
ethreads branch.  From there on, we’ll have a better idea of whether a
scheduler framework or something should be added to Guile proper.

I’ll take a look at the code.

Thanks a lot for fearlessly diving into this!  :-)

Ludo’.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-14 14:03 ` wip-ports-refactor Ludovic Courtès
@ 2016-04-17  8:49   ` Andy Wingo
  2016-04-17 10:44     ` wip-ports-refactor Ludovic Courtès
  2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
  0 siblings, 2 replies; 19+ messages in thread
From: Andy Wingo @ 2016-04-17  8:49 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi :)

On Thu 14 Apr 2016 16:03, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> I have been working on a refactor to ports.  The goal is to have a
>> better concurrency story.  Let me tell that story then get down to the
>> details.
>
> In addition to concurrency and thread-safety, I’m very much interested
> in the impact of this change on the API (I’ve always found the port API
> in C to be really bad), on the flexibility it would provide, and on
> performance—‘read-char’ and ‘get-u8’ are currently prohibitively slow!

Yeah.  Of course improving the port internals is technically a breaking
change, but I think probably the set of people that have implemented
ports using the C API can be counted on two hands, and I hope to find
everyone and help them adapt :)

From the speed side, I think that considering read-char to be
prohibitively slow is an incorrect diagnosis.  First let's define a
helper:

    (define-syntax-rule (do-times n exp)
      (let lp ((i 0)) (let ((res exp)) (if (< i n) (lp (1+ i)) res))))

I want to test four things.

    ;; 1. How long a loop up to 10 million takes (baseline measurement).
    (let ((port (open-input-string "s"))) (do-times #e1e7 1))

    ;; 2. A call to a simple Scheme function.
    (define (foo port) 42)
    (let ((port (open-input-string "s"))) (do-times #e1e7 (foo port)))

    ;; 3. A call to a port subr.
    (let ((port (open-input-string "s"))) (do-times #e1e7 (port-line port)))

    ;; 4. A call to a port subr that touches the buffer.
    (let ((port (open-input-string "s"))) (do-times #e1e7 (peek-char port)))

The results:

                      | baseline | foo    | port-line | peek-char
    ------------------+----------+--------+-----------+----------
    guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
    guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
    wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s

These were single measurements at the REPL on my i7-5600U, run with
--no-debug.  The results were fairly consistent.  Note that because this
is a loop, Guile 2.2's compiler gets some "unfair" advantages related to
loop-invariant code motion and peeling; but real parsers etc written on
top of read-char will also have loops, so to a degree it's OK.

Conclusions:

  1. Guile 2.2 makes calling a subr just as cheap as calling a Scheme
     function.

  2. The overhead of using Guile 2.0 is much greater than the overhead
     of calling peek-char.

  3. peek-char is slower than other leaf functions in Guile 2.2 but only
     by 2x or so; I am sure it can be faster but I don't know by how
     much.  Consider that it has to:

       1. type-check the argument
       2. get the port buffer and cursors
       3. if there is enough data in the buffer to decode a char, do
          it.  otherwise, slow-path.

     If we consider implementing this in Scheme, it might get slower
     than it currently is in 2.2, because of the switch from C->C calls
     (internal to ports.c and other C files) to Scheme->Scheme calls,
     probably with some additional subr calls to get state from the
     port.  We might gain some of that back by removing the lock; dunno.

It would be nice to be able to decode chars from UTF-8 or ISO-8859-1
ports from Scheme.  But we always have to be able to call out to iconv
too.  Mark has mused on making the port buffer always UTF-8, but I don't
quite see how this could work.  I guess you could have a second port
buffer for decoded UTF-8 chars, but that starts to look quite
complicated to me.

Anyway.  I think that given the huge performance window opened up to us
by the 2.0->2.2 switch, we should consider speed considerations as
important but not primary -- when given a choice between speed and
maintainability, or speed and the ability to suspend a port, we
shouldn't choose speed.

That said, the real way to make port operations fast is (1) to buffer
the port, and (2) to operate on the buffer directly instead of fetching
data octet-by-octet.  Exposing the port buffer to Scheme allows this
kind of punch-through optimization to be implemented where needed.

Cheers,

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-17  8:49   ` wip-ports-refactor Andy Wingo
@ 2016-04-17 10:44     ` Ludovic Courtès
  2016-04-19  8:00       ` wip-ports-refactor Andy Wingo
  2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
  1 sibling, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2016-04-17 10:44 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> I want to test four things.
>
>     ;; 1. How long a loop up to 10 million takes (baseline measurement).
>     (let ((port (open-input-string "s"))) (do-times #e1e7 1))
>
>     ;; 2. A call to a simple Scheme function.
>     (define (foo port) 42)
>     (let ((port (open-input-string "s"))) (do-times #e1e7 (foo port)))
>
>     ;; 3. A call to a port subr.
>     (let ((port (open-input-string "s"))) (do-times #e1e7 (port-line port)))
>
>     ;; 4. A call to a port subr that touches the buffer.
>     (let ((port (open-input-string "s"))) (do-times #e1e7 (peek-char port)))
>
> The results:
>
>                       | baseline | foo    | port-line | peek-char
>     ------------------+----------+--------+-----------+----------
>     guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
>     guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
>     wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s

Oh, nice!  (By “prohibitively slow” I was referring to 2.0.)

For ‘peek-char’, isn’t there also the fact that string ports in 2.2 are
UTF-8 by default, so we get the fast path, whereas in 2.0 there
‘%default-port-encoding’, which could be something else leading to the
slow path?

Would be nice to check if doing:

  (with-fluids ((%default-port-encoding "UTF-8"))
    (open-input-string "s"))

makes a difference in the 2.0 measurements.

I hadn’t realized that subr calls had become this cheaper in 2.2, that’s
awesome.

> Anyway.  I think that given the huge performance window opened up to us
> by the 2.0->2.2 switch, we should consider speed considerations as
> important but not primary -- when given a choice between speed and
> maintainability, or speed and the ability to suspend a port, we
> shouldn't choose speed.

Agreed.

> That said, the real way to make port operations fast is (1) to buffer
> the port, and (2) to operate on the buffer directly instead of fetching
> data octet-by-octet.  Exposing the port buffer to Scheme allows this
> kind of punch-through optimization to be implemented where needed.

In fact my comment about speed was because I was expecting the
port-refactor work to improve performance for things like ‘read-char’,
which seems to be the case already.

Thank you!

Ludo’.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-17 10:44     ` wip-ports-refactor Ludovic Courtès
@ 2016-04-19  8:00       ` Andy Wingo
  2016-04-19 14:15         ` wip-ports-refactor Ludovic Courtès
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Wingo @ 2016-04-19  8:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi,

On Sun 17 Apr 2016 12:44, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> I want to test four things.
>>
>>     ;; 1. How long a loop up to 10 million takes (baseline measurement).
>>     (let ((port (open-input-string "s"))) (do-times #e1e7 1))
>>
>>     ;; 2. A call to a simple Scheme function.
>>     (define (foo port) 42)
>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (foo port)))
>>
>>     ;; 3. A call to a port subr.
>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (port-line port)))
>>
>>     ;; 4. A call to a port subr that touches the buffer.
>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (peek-char port)))
>>
>> The results:
>>
>>                       | baseline | foo    | port-line | peek-char
>>     ------------------+----------+--------+-----------+----------
>>     guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
>>     guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
>>     wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s
>
> Oh, nice!  (By “prohibitively slow” I was referring to 2.0.)
>
> For ‘peek-char’, isn’t there also the fact that string ports in 2.2 are
> UTF-8 by default, so we get the fast path, whereas in 2.0 there
> ‘%default-port-encoding’, which could be something else leading to the
> slow path?

I tried making sure the string port was a UTF-8 port but that made no
difference to the 2.0 peek-char times.  I suspect this is because I ran
it at the REPL, which had done a setlocale() already.  But perhaps
that's not the right explanation.

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-19  8:00       ` wip-ports-refactor Andy Wingo
@ 2016-04-19 14:15         ` Ludovic Courtès
  0 siblings, 0 replies; 19+ messages in thread
From: Ludovic Courtès @ 2016-04-19 14:15 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hello!

Andy Wingo <wingo@pobox.com> skribis:

> On Sun 17 Apr 2016 12:44, ludo@gnu.org (Ludovic Courtès) writes:
>
>> Andy Wingo <wingo@pobox.com> skribis:
>>
>>> I want to test four things.
>>>
>>>     ;; 1. How long a loop up to 10 million takes (baseline measurement).
>>>     (let ((port (open-input-string "s"))) (do-times #e1e7 1))
>>>
>>>     ;; 2. A call to a simple Scheme function.
>>>     (define (foo port) 42)
>>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (foo port)))
>>>
>>>     ;; 3. A call to a port subr.
>>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (port-line port)))
>>>
>>>     ;; 4. A call to a port subr that touches the buffer.
>>>     (let ((port (open-input-string "s"))) (do-times #e1e7 (peek-char port)))
>>>
>>> The results:
>>>
>>>                       | baseline | foo    | port-line | peek-char
>>>     ------------------+----------+--------+-----------+----------
>>>     guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
>>>     guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
>>>     wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s
>>
>> Oh, nice!  (By “prohibitively slow” I was referring to 2.0.)
>>
>> For ‘peek-char’, isn’t there also the fact that string ports in 2.2 are
>> UTF-8 by default, so we get the fast path, whereas in 2.0 there
>> ‘%default-port-encoding’, which could be something else leading to the
>> slow path?
>
> I tried making sure the string port was a UTF-8 port but that made no
> difference to the 2.0 peek-char times.  I suspect this is because I ran
> it at the REPL, which had done a setlocale() already.  But perhaps
> that's not the right explanation.

It’s definitely the case if you use a UTF-8 locale.

Thanks for checking!

Ludo’.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-06 20:46 wip-ports-refactor Andy Wingo
                   ` (2 preceding siblings ...)
  2016-04-14 14:03 ` wip-ports-refactor Ludovic Courtès
@ 2016-04-24 11:05 ` Chris Vine
  2016-05-10 14:30   ` wip-ports-refactor Andy Wingo
  3 siblings, 1 reply; 19+ messages in thread
From: Chris Vine @ 2016-04-24 11:05 UTC (permalink / raw)
  To: guile-devel

On Wed, 06 Apr 2016 22:46:28 +0200
Andy Wingo <wingo@pobox.com> wrote:
> So, right now Guile has a pretty poor concurrency story.  We just have
> pthreads, which is great in many ways, but nobody feels like
> recommending this to users.  The reason is that when pthreads were
> originally added to Guile, they were done in such a way that we could
> assume that data races would just be OK.  It's amazing to reflect upon
> this, but that's how it is.  Many internal parts of Guile are
> vulnerable to corruption when run under multiple kernel threads in
> parallel. Consider what happens when you try to load a module from
> two threads at the same time.  What happens?  What should happen?
> Should it be possible to load two modules in parallel?  The system
> hasn't really been designed as a whole.  Guile has no memory model,
> as such.  We have patches over various issues, ad-hoc locks, but it's
> not in a state where we can recommend that users seriously use
> threads.

I am not going to comment on the rest of your post, because you know
far more about it than I could hope to, but on the question of guile's
thread implementation, it seems to me to be basically sound if you
avoid obvious global state.  I have had test code running for hours,
indeed days, without any appearance of data races or other incorrect
behaviour on account of guile's thread implementation.  Global state is
an issue.  Module loading (which you mention) is an obvious one, but
other things like setting load paths don't look to be thread safe
either.

It would be disappointing to give up on the current thread
implementation.  Better I think in the interim is to document better
what is not thread-safe.  Some attempts at thread safety are in my view
a waste of time anyway, including trying to produce individual ports
which can safely be accessed in multiple threads.  Multi-threading with
ports requires the prevention of interleaving, not just the prevention
of data races, and that is I think best done by locking at the user
level rather than the library level.

Co-operative multi-tasking using asynchronous frameworks such as 8sync
or another one in which I have an interest, and pre-emptive
multi-tasking using a scheduler and "green" threads, are all very well
but they do not enable use of more than one processor, and more
importantly (because I recognise that guile may not necessarily be
intended for use cases needing multiple processors for performance
reasons), they can be more difficult to use.  In particular, anything
which makes a blocking system call will wedge the whole program.

Chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-24 11:05 ` wip-ports-refactor Chris Vine
@ 2016-05-10 14:30   ` Andy Wingo
  2016-05-11 10:42     ` wip-ports-refactor Chris Vine
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Wingo @ 2016-05-10 14:30 UTC (permalink / raw)
  To: Chris Vine; +Cc: guile-devel

Hi :)

On Sun 24 Apr 2016 13:05, Chris Vine <chris@cvine.freeserve.co.uk> writes:

> on the question of guile's thread implementation, it seems to me to be
> basically sound if you avoid obvious global state.  I have had test
> code running for hours, indeed days, without any appearance of data
> races or other incorrect behaviour on account of guile's thread
> implementation.  Global state is an issue.  Module loading (which you
> mention) is an obvious one, but other things like setting load paths
> don't look to be thread safe either.

I think we have no plans for giving up pthreads.  The problem is that
like you say, if there is no shared state, and your architecture has a
reasonable memory model (Intel's memory model is really great to
program), then you're fine.  But if you don't have a good mental model
on what is shared state, or your architecture doesn't serialize loads
and stores... well there things are likely to break.

I like to recommend solutions that will absolutely work and never crash.
(They could throw errors, but that doesn't crash Guile.)  I can't do
that with threads -- not right now anyway.  If you know what you're
doing though, go ahead and use them :)

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-04-17  8:49   ` wip-ports-refactor Andy Wingo
  2016-04-17 10:44     ` wip-ports-refactor Ludovic Courtès
@ 2016-05-10 15:02     ` Andy Wingo
  2016-05-10 16:53       ` wip-ports-refactor Andy Wingo
                         ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Andy Wingo @ 2016-05-10 15:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Greets,

On Sun 17 Apr 2016 10:49, Andy Wingo <wingo@pobox.com> writes:

>                       | baseline | foo    | port-line | peek-char
>     ------------------+----------+--------+-----------+----------
>     guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
>     guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
>     wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s

So, I have completed the move to port buffers that are exposed to
Scheme.  I also ported the machinery needed to read characters and bytes
to Scheme, while keeping the C code around.  The results are a bit
frustrating.  Here I'm going to use a file that contains only latin1
characters:

  (with-output-to-file "/tmp/testies.txt" (lambda () (do-times #e1e6 (write-char #\a))))

This is in a UTF-8 locale.  OK.  So we have 10M "a" characters.  I now
want to test these things:

  1. peek-char, 1e7 times.
  2. read-char, 1e7 times.
  3. lookahead-u8, 1e7 times.  (Call it peek-byte.)
  4. get-u8, 1e7 times.  (Call it read-byte.)

                       | peek-char | read-char | peek-byte | read-byte
  ---------------------+-----------+-----------+-----------+----------
  2.0                  | 0.811s    | 0.711s    | 0.619s    | 0.623s
  master               | 0.410s    | 0.331s    | 0.428s    | 0.411s
  port-refactor C      | 0.333s    | 0.358s    | 0.265s    | 0.245s
  port-refactor Scheme | 1.041s    | 1.820s    | 0.682s    | 0.727s

Again, measurements on my i7-5600U, best of three, --no-debug.

Conclusions:

  1. In Guile master and 2.0, reading is faster than peeking, because it
     does a read then a putback.  In wip-port-refactor, the reverse is
     true: peeking fills the buffer, and reading advances the buffer
     pointers.

  2. Scheme appears to be about 3-4 times slower than C in
     port-refactor.  It's slower than 2.0, unfortunately.  I am certain
     that we will get the difference back when we get native compilation
     but I don't know when that would be.

  3. There are some compiler improvements that could help Scheme
     performance too.  For example the bit that updates the port
     positions is not optimal.  We could expose it from C of course.

Note that this Scheme implementation passes ports.test, so there
shouldn't be any hidden surprises.

I am not sure what to do, to be honest.  I think I would switch to
Scheme if it let me throw away the C code, but I don't see the path
forward on that right now due to bootstrap reasons.  I think if I could
golf `read-char' down to 1.100s or so it would become more palatable.

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
@ 2016-05-10 16:53       ` Andy Wingo
  2016-05-11 14:00       ` wip-ports-refactor Christopher Allan Webber
  2016-05-11 14:23       ` wip-ports-refactor Ludovic Courtès
  2 siblings, 0 replies; 19+ messages in thread
From: Andy Wingo @ 2016-05-10 16:53 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Tue 10 May 2016 17:02, Andy Wingo <wingo@pobox.com> writes:

>   (with-output-to-file "/tmp/testies.txt" (lambda () (do-times #e1e6 (write-char #\a))))

Sorry, I meant #e1e7.  The file really does have 10M characters.
Actually 10M+1 because of a do-times bug :P
>
> This is in a UTF-8 locale.  OK.  So we have 10M "a" characters.  I now
> want to test these things:
>
>   1. peek-char, 1e7 times.
>   2. read-char, 1e7 times.
>   3. lookahead-u8, 1e7 times.  (Call it peek-byte.)
>   4. get-u8, 1e7 times.  (Call it read-byte.)
>
>                        | peek-char | read-char | peek-byte | read-byte
>   ---------------------+-----------+-----------+-----------+----------
>   2.0                  | 0.811s    | 0.711s    | 0.619s    | 0.623s
>   master               | 0.410s    | 0.331s    | 0.428s    | 0.411s
>   port-refactor C      | 0.333s    | 0.358s    | 0.265s    | 0.245s
>   port-refactor Scheme | 1.041s    | 1.820s    | 0.682s    | 0.727s
>
> Again, measurements on my i7-5600U, best of three, --no-debug.
>
> Conclusions:
>
>   1. In Guile master and 2.0, reading is faster than peeking, because it
>      does a read then a putback.  In wip-port-refactor, the reverse is
>      true: peeking fills the buffer, and reading advances the buffer
>      pointers.
>
>   2. Scheme appears to be about 3-4 times slower than C in
>      port-refactor.  It's slower than 2.0, unfortunately.  I am certain
>      that we will get the difference back when we get native compilation
>      but I don't know when that would be.
>
>   3. There are some compiler improvements that could help Scheme
>      performance too.  For example the bit that updates the port
>      positions is not optimal.  We could expose it from C of course.
>
> Note that this Scheme implementation passes ports.test, so there
> shouldn't be any hidden surprises.
>
> I am not sure what to do, to be honest.  I think I would switch to
> Scheme if it let me throw away the C code, but I don't see the path
> forward on that right now due to bootstrap reasons.  I think if I could
> golf `read-char' down to 1.100s or so it would become more palatable.
>
> Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-10 14:30   ` wip-ports-refactor Andy Wingo
@ 2016-05-11 10:42     ` Chris Vine
  2016-05-12  6:16       ` wip-ports-refactor Andy Wingo
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Vine @ 2016-05-11 10:42 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

On Tue, 10 May 2016 16:30:30 +0200
Andy Wingo <wingo@pobox.com> wrote:
> I think we have no plans for giving up pthreads.  The problem is that
> like you say, if there is no shared state, and your architecture has a
> reasonable memory model (Intel's memory model is really great to
> program), then you're fine.  But if you don't have a good mental model
> on what is shared state, or your architecture doesn't serialize loads
> and stores... well there things are likely to break.

Hi Andy,

That I wasn't expecting.  So you are saying that some parts of guile
rely on the ordering guarantees of the x86 memory model (or something
like it) with respect to atomic operations on some internal localised
shared state[1]?  Of course, if guile is unduly economical with its
synchronisation on atomics, that doesn't stop the compiler doing some
reordering for you, particularly now there is a C11 memory model.

Looking at the pthread related stuff in libguile, it seems to be
written by someone/people who know what they are doing.  Are you
referring specifically to the guile VM, and if so is guile-2.2 likely
to be more problematic than guile-2.0?

Chris

[1] I am not talking about things like the loading of guile modules
here, which involves global shared state and probably can't be done lock
free (and doesn't need to be) and may require other higher level
synchronisation such as mutexes.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
  2016-05-10 16:53       ` wip-ports-refactor Andy Wingo
@ 2016-05-11 14:00       ` Christopher Allan Webber
  2016-05-11 14:23       ` wip-ports-refactor Ludovic Courtès
  2 siblings, 0 replies; 19+ messages in thread
From: Christopher Allan Webber @ 2016-05-11 14:00 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Andy Wingo writes:

> Greets,
>
> On Sun 17 Apr 2016 10:49, Andy Wingo <wingo@pobox.com> writes:
>
>>                       | baseline | foo    | port-line | peek-char
>>     ------------------+----------+--------+-----------+----------
>>     guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
>>     guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
>>     wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s
>
> So, I have completed the move to port buffers that are exposed to
> Scheme.  I also ported the machinery needed to read characters and bytes
> to Scheme, while keeping the C code around.  The results are a bit
> frustrating.  Here I'm going to use a file that contains only latin1
> characters:
>
>   (with-output-to-file "/tmp/testies.txt" (lambda () (do-times #e1e6 (write-char #\a))))
>
> This is in a UTF-8 locale.  OK.  So we have 10M "a" characters.  I now
> want to test these things:
>
>   1. peek-char, 1e7 times.
>   2. read-char, 1e7 times.
>   3. lookahead-u8, 1e7 times.  (Call it peek-byte.)
>   4. get-u8, 1e7 times.  (Call it read-byte.)
>
>                        | peek-char | read-char | peek-byte | read-byte
>   ---------------------+-----------+-----------+-----------+----------
>   2.0                  | 0.811s    | 0.711s    | 0.619s    | 0.623s
>   master               | 0.410s    | 0.331s    | 0.428s    | 0.411s
>   port-refactor C      | 0.333s    | 0.358s    | 0.265s    | 0.245s
>   port-refactor Scheme | 1.041s    | 1.820s    | 0.682s    | 0.727s
>
> Again, measurements on my i7-5600U, best of three, --no-debug.
>
> Conclusions:
>
>   1. In Guile master and 2.0, reading is faster than peeking, because it
>      does a read then a putback.  In wip-port-refactor, the reverse is
>      true: peeking fills the buffer, and reading advances the buffer
>      pointers.
>
>   2. Scheme appears to be about 3-4 times slower than C in
>      port-refactor.  It's slower than 2.0, unfortunately.  I am certain
>      that we will get the difference back when we get native compilation
>      but I don't know when that would be.
>
>   3. There are some compiler improvements that could help Scheme
>      performance too.  For example the bit that updates the port
>      positions is not optimal.  We could expose it from C of course.
>
> Note that this Scheme implementation passes ports.test, so there
> shouldn't be any hidden surprises.
>
> I am not sure what to do, to be honest.  I think I would switch to
> Scheme if it let me throw away the C code, but I don't see the path
> forward on that right now due to bootstrap reasons.  I think if I could
> golf `read-char' down to 1.100s or so it would become more palatable.
>
> Andy

Happily at least, none of these benchmarks are *that much* slower than
Guile 2.0.  So most "present day" users won't be noticing a slowdown in
IO if this slipped into the next release.

You're probably right (is my vague and uninformed suspicion) that native
compilation would speed it up.

My thoughts are: if this refactor could bring us closer to more useful
code for everyday users, a small slowdown over 2.0 is not so bad.  Eg,
if we could get SSL support, and buffered reads with prompts,
etc... those are good features.  So if you had my vote I'd say: forge
ahead on adding those, and if they come out well, then I think this
merge is worth it anyway, despite a small slowdown in IO over 2.0.
Hopefully we'll get it back in the future anyway!

 - Chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
  2016-05-10 16:53       ` wip-ports-refactor Andy Wingo
  2016-05-11 14:00       ` wip-ports-refactor Christopher Allan Webber
@ 2016-05-11 14:23       ` Ludovic Courtès
  2016-05-12  8:15         ` wip-ports-refactor Andy Wingo
  2 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2016-05-11 14:23 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hello!

Andy Wingo <wingo@pobox.com> skribis:

> This is in a UTF-8 locale.  OK.  So we have 10M "a" characters.  I now
> want to test these things:
>
>   1. peek-char, 1e7 times.
>   2. read-char, 1e7 times.
>   3. lookahead-u8, 1e7 times.  (Call it peek-byte.)
>   4. get-u8, 1e7 times.  (Call it read-byte.)
>
>                        | peek-char | read-char | peek-byte | read-byte
>   ---------------------+-----------+-----------+-----------+----------
>   2.0                  | 0.811s    | 0.711s    | 0.619s    | 0.623s
>   master               | 0.410s    | 0.331s    | 0.428s    | 0.411s
>   port-refactor C      | 0.333s    | 0.358s    | 0.265s    | 0.245s
>   port-refactor Scheme | 1.041s    | 1.820s    | 0.682s    | 0.727s
>
> Again, measurements on my i7-5600U, best of three, --no-debug.
>
> Conclusions:
>
>   1. In Guile master and 2.0, reading is faster than peeking, because it
>      does a read then a putback.  In wip-port-refactor, the reverse is
>      true: peeking fills the buffer, and reading advances the buffer
>      pointers.
>
>   2. Scheme appears to be about 3-4 times slower than C in
>      port-refactor.  It's slower than 2.0, unfortunately.  I am certain
>      that we will get the difference back when we get native compilation
>      but I don't know when that would be.
>
>   3. There are some compiler improvements that could help Scheme
>      performance too.  For example the bit that updates the port
>      positions is not optimal.  We could expose it from C of course.
>
> Note that this Scheme implementation passes ports.test, so there
> shouldn't be any hidden surprises.

Thanks for the thorough benchmarks!

My current inclination, based on this, would be to use the
“port-refactor C” version for 2.2, and save the Scheme variant for 2.4
maybe.

This is obviously frustrating, but I think we cannot afford to make I/O
slower than on 2.0, where it’s already too slow for some applications
IMO.

WDYT?

Regardless, your work in this area is just awesome!

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-11 10:42     ` wip-ports-refactor Chris Vine
@ 2016-05-12  6:16       ` Andy Wingo
  0 siblings, 0 replies; 19+ messages in thread
From: Andy Wingo @ 2016-05-12  6:16 UTC (permalink / raw)
  To: Chris Vine; +Cc: guile-devel

On Wed 11 May 2016 12:42, Chris Vine <chris@cvine.freeserve.co.uk> writes:

> So you are saying that some parts of guile rely on the ordering
> guarantees of the x86 memory model (or something like it) with respect
> to atomic operations on some internal localised shared state?

Let's say you cons a fresh pair and pass it to another thread.  So first
Guile will allocate the pair then it will initialize the car and cdr
fields.  Does the other thread see the "car" and "cdr" values which the
first one set?

In Intel, yes.  But not all architectures are like that.  Storing a
value to the "car" of a pair might not imply any ordering with respect
to a read to that same memory location from another thread.  AFAIU
anyway.

> Looking at the pthread related stuff in libguile, it seems to be
> written by someone/people who know what they are doing.  Are you
> referring specifically to the guile VM, and if so is guile-2.2 likely
> to be more problematic than guile-2.0?

I think Guile 2.2 is likely to be better if only because the port
situation is better there, and also weak maps are thread-safe (because
they lock now).  Otherwise no significant change.

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: wip-ports-refactor
  2016-05-11 14:23       ` wip-ports-refactor Ludovic Courtès
@ 2016-05-12  8:15         ` Andy Wingo
  0 siblings, 0 replies; 19+ messages in thread
From: Andy Wingo @ 2016-05-12  8:15 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Wed 11 May 2016 16:23, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>>                        | peek-char | read-char | peek-byte | read-byte
>>   ---------------------+-----------+-----------+-----------+----------
>>   2.0                  | 0.811s    | 0.711s    | 0.619s    | 0.623s
>>   master               | 0.410s    | 0.331s    | 0.428s    | 0.411s
>>   port-refactor C      | 0.333s    | 0.358s    | 0.265s    | 0.245s
>>   port-refactor Scheme | 1.041s    | 1.820s    | 0.682s    | 0.727s
>>
> My current inclination, based on this, would be to use the
> “port-refactor C” version for 2.2, and save the Scheme variant for 2.4
> maybe.
>
> This is obviously frustrating, but I think we cannot afford to make I/O
> slower than on 2.0, where it’s already too slow for some applications
> IMO.
>
> WDYT?

Humm, you might be right! I still have some questions though.

Before the questions, one point -- the "port refactor C" and "port
refactor Scheme" variants use the same underlying port structure.  The
Scheme variant just implements part of the port runtime in Scheme
instead of using the C runtime.  So, it would be possible to have the
Scheme version in a module in 2.2, if that were useful -- and I think
it's useful enough to enable green threads that suspend on I/O.

Regarding speed, you say 2.0 is too slow, but I could not reproduce that
in my initial benchmarks.  But, I can't be sure what you mean -- there
are the different axes as to whether you're processing input a byte at a
time, or in a block of bytes, whether you are decoding text or not, and
whether the port is buffered or not.  Do you recall any more details?
In my experience I never found 2.0 I/O to be too slow, but your mileage
evidently varies.

On all of these different axes, wip-port-refactor will be better because
it uniformly buffers the input through a standard buffer, which I/O
routines like get-bytevector can peek into directly.  This is true
whether the I/O routines are written in Scheme or in C.

The cases that I measure above are the worst cases for the difference
between C and Scheme, because they have all the overhead of a potential
slow-path that does a buffer fill and BOM handling and what-not but they
do very little I/O (just a byte or a char) and have no associated
workload (they do nothing with those bytes/chars).  Routines that read
multiple bytes at a time would not have as big a difference between
Scheme and C.

Furthermore with the uniform buffer, we can rewrite things like
read-line to peek directly into that buffer instead of doing a bunch of
read-char operations.  This would be big news for things like the web
server.  We could make this refactor either in Scheme or in C but I
suspect the performance would be similar, and Scheme is better than C
;-)

But, probably it is time to step back a bit now that the core changes to
the ports infrastructure have been made and seem to be performant
enough.  I will see if I can manage to get the extra internal helpers
that I had to expose shunted off into a separate module that's not
exposed to (guile-user), and then if that's all looking good I'll
update documentation and NEWS and see about a release.

What think ye?

Andy



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-05-12  8:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-06 20:46 wip-ports-refactor Andy Wingo
2016-04-07  4:16 ` wip-ports-refactor Christopher Allan Webber
2016-04-12  8:52   ` wip-ports-refactor Andy Wingo
2016-04-13 14:27     ` wip-ports-refactor Christopher Allan Webber
2016-04-12  9:33 ` wip-ports-refactor Andy Wingo
2016-04-14 14:03 ` wip-ports-refactor Ludovic Courtès
2016-04-17  8:49   ` wip-ports-refactor Andy Wingo
2016-04-17 10:44     ` wip-ports-refactor Ludovic Courtès
2016-04-19  8:00       ` wip-ports-refactor Andy Wingo
2016-04-19 14:15         ` wip-ports-refactor Ludovic Courtès
2016-05-10 15:02     ` wip-ports-refactor Andy Wingo
2016-05-10 16:53       ` wip-ports-refactor Andy Wingo
2016-05-11 14:00       ` wip-ports-refactor Christopher Allan Webber
2016-05-11 14:23       ` wip-ports-refactor Ludovic Courtès
2016-05-12  8:15         ` wip-ports-refactor Andy Wingo
2016-04-24 11:05 ` wip-ports-refactor Chris Vine
2016-05-10 14:30   ` wip-ports-refactor Andy Wingo
2016-05-11 10:42     ` wip-ports-refactor Chris Vine
2016-05-12  6:16       ` wip-ports-refactor Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).