thoughts on ports

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* thoughts on ports
@ 2012-04-08 20:21 Andy Wingo
  2012-04-09 19:15 ` Mike Gran
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andy Wingo @ 2012-04-08 20:21 UTC (permalink / raw)
  To: guile-devel

Hi all,

I have been thinking about ports recently.  I know other folks have had
some thoughts here too, so it's probably good to have a discussion about
how they should look.

I'm coming from the perspective of the recent (ice-9 eports) work on
wip-ethreads.  I found that it's fun and useful to be able to implement
ports in Scheme.  Fun, because it's Scheme; and useful, because you can
block via saving the (composable) continuation, adding to a poll loop,
and rescheduling.  There are also some potential optimizations when you
implement ports things in Scheme because for most users, who program in
Scheme, you cut out some layers.

It turns out that (ice-9 eports) don't actually have anything to do with
events, in the end -- having added a simple abstraction for
read/write/close operations, there is no fd-specific code in the eports
stuff.  Eports are more about efficiently and flexibly handling binary
input and output, with appropriate buffering.

That starts to raise the question of what the relationship of (ice-9
eports) is with our ports implemented in C (let's call them "cports"),
and the panoply of interfaces implemented for cports.

Obviously we need ports implemented in C because of bootstrapping
concerns.  But can we give Scheme access to buffering and the underlying
fill (read) / drain (write) / wait (select) operations?

So, the idea: refactor the port buffers (read, write, putback) to be
Scheme bytevectors, and internally store offsets instead of pointers.
Give access to some internal port primitives to a new (ice-9 ports)
module.

I think we can manage to make (ice-9 ports) operate in both binary and
textual modes without a problem, just as we do with cports.  We'll have
to expose some iconv primitives to (ice-9 ports), but that's just as
well.  (Perhaps we should supply an (ice-9 iconv) module ?)

This is also our chance to modularize the ports code.  We can add module
autoloads to load up less-frequently-used parts of the ports interface
on demand.

Anyway, that's my current thought.  Again, the advantages: fewer layers
between Scheme and I/O, modularization, and the ability to suspend
blocking operations in user-space rather than kernel-space.

Thoughts?

Cheers,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2012-04-08 20:21 thoughts on ports Andy Wingo
@ 2012-04-09 19:15 ` Mike Gran
  2012-04-09 20:21   ` Noah Lavine
  2012-04-10 22:11 ` Ludovic Courtès
  2012-04-11 18:36 ` Mark H Weaver
  2 siblings, 1 reply; 9+ messages in thread
From: Mike Gran @ 2012-04-09 19:15 UTC (permalink / raw)
  To: Andy Wingo, guile-devel

> From: Andy Wingo <wingo@pobox.com>
>  
>Hi all,
>
>I have been thinking about ports recently.  I know other folks have had
>some thoughts here too, so it's probably good to have a discussion about
>how they should look.

...

>Obviously we need ports implemented in C because of bootstrapping
>concerns.  But can we give Scheme access to buffering and the underlying
>fill (read) / drain (write) / wait (select) operations?
>
>So, the idea: refactor the port buffers (read, write, putback) to be
>Scheme bytevectors, and internally store offsets instead of pointers.
>Give access to some internal port primitives to a new (ice-9 ports)
>module.

Hi Andy,

I haven't looked at your eports, yet.

Let me just throw out some tangential info from when I played with
ports a couple of years ago.

The most awkward thing I did had to do with pushing back to
ports.  As you know, when you putback to file port, you're not really
unreading the character from the device.  You're allocating a memory
buffer to pretend to be the device.  (At the time push back buffers were
8-bit locale encoded, so every unget operation involved a conversion.)

So, you're abstracting the unget operation so that any port can do it,
even a read-only file's port.

The C version of the reader used (uses?) pushback a lot.  At least once
per parenthesized expression.

Anyway, here's an idea.  Let's call the C code for ports 'base ports'.

1. Refactor the C reader so that it took on the responsibility of
storing the putbacked (ungotton?) characters.

2. This would let you simplify the base ports to make read, write,
and unget into those operations that the underlying device could
provide.  During bootstrap, a read-only base file port would error on
unget, for example.  Base ports would not longer have pushback buffers.
There would be no such thing as string base ports since they aren't
a device.

3. Then you could get through bootstrap and create scheme ports on top
of the base port primitives.

Just an idea.

-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2012-04-09 19:15 ` Mike Gran
@ 2012-04-09 20:21   ` Noah Lavine
  0 siblings, 0 replies; 9+ messages in thread
From: Noah Lavine @ 2012-04-09 20:21 UTC (permalink / raw)
  To: Mike Gran; +Cc: Andy Wingo, guile-devel

Hello,

On Mon, Apr 9, 2012 at 3:15 PM, Mike Gran <spk121@yahoo.com> wrote:
> Anyway, here's an idea.  Let's call the C code for ports 'base ports'.
>
> 1. Refactor the C reader so that it took on the responsibility of
> storing the putbacked (ungotton?) characters.
>
> 2. This would let you simplify the base ports to make read, write,
> and unget into those operations that the underlying device could
> provide.  During bootstrap, a read-only base file port would error on
> unget, for example.  Base ports would not longer have pushback buffers.
> There would be no such thing as string base ports since they aren't
> a device.
>
> 3. Then you could get through bootstrap and create scheme ports on top
> of the base port primitives.

Would it be possible to make this simpler (at least for now) by
changing steps 1 and 2 as follows?

1. Refactor ports code so read, write, unget are separate operations
with Scheme interfaces (where supported).

2. Define the standard Scheme port type in C, using the
read/write/unget primitives, so that the C reader can use it during
startup

That gives you the flexibility of having the primitives around, but
lets you avoid the rewrite of the C reader.

Noah



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2012-04-08 20:21 thoughts on ports Andy Wingo
  2012-04-09 19:15 ` Mike Gran
@ 2012-04-10 22:11 ` Ludovic Courtès
  2013-01-17 14:40   ` Andy Wingo
  2012-04-11 18:36 ` Mark H Weaver
  2 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2012-04-10 22:11 UTC (permalink / raw)
  To: guile-devel

Hello!

Andy Wingo <wingo@pobox.com> skribis:

> Obviously we need ports implemented in C because of bootstrapping
> concerns.  But can we give Scheme access to buffering and the underlying
> fill (read) / drain (write) / wait (select) operations?
>
> So, the idea: refactor the port buffers (read, write, putback) to be
> Scheme bytevectors, and internally store offsets instead of pointers.
> Give access to some internal port primitives to a new (ice-9 ports)
> module.
>
> I think we can manage to make (ice-9 ports) operate in both binary and
> textual modes without a problem, just as we do with cports.  We'll have
> to expose some iconv primitives to (ice-9 ports), but that's just as
> well.  (Perhaps we should supply an (ice-9 iconv) module ?)

I like the idea (more Scheme!).  However, it’s not clear to me what the
performance impact would be with Guile’s current state.

For instance, while ‘read’ remains in C, it can only suffer from such a
change.  Conversely, things like ‘get-u8’ and ‘get-bytevector-n!’ may be
faster.  OTOH, the equivalent of ‘get_utf8_codepoint’ is likely to be
much slower.  And we still need to call out to C for ‘iconv’ and
libunistring.

My 2¢,
Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2012-04-08 20:21 thoughts on ports Andy Wingo
  2012-04-09 19:15 ` Mike Gran
  2012-04-10 22:11 ` Ludovic Courtès
@ 2012-04-11 18:36 ` Mark H Weaver
  2 siblings, 0 replies; 9+ messages in thread
From: Mark H Weaver @ 2012-04-11 18:36 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi Andy,

Andy Wingo <wingo@pobox.com> writes:
> I have been thinking about ports recently.  I know other folks have had
> some thoughts here too, so it's probably good to have a discussion about
> how they should look.

I apologize that I've not yet had time to properly look into your eports
work, so I'm afraid this email will not address your points adequately,
but nonetheless I've been thinking about the ports code sporadically for
many months, and here are my preliminary (and perhaps controversial :)
thoughts:

First of all, I think the ports code should be split into two layers:
the lower layer should implement only byte ports (i.e. binary I/O).
This layer should have multiple backends to support POSIX-style ports
(e.g. file and socket I/O) as well as bytevector ports and user-defined
"soft" byte ports.

The upper layer should instead offer a character-based interface
(probably unicode code-points in practice).  It should also support
multiple backends, including string ports, user-defined "soft" character
ports, and perhaps most importantly a transcoding port that provides a
character-based view of any arbitrary byte port, using a particular
encoding.

IMO, port-encoding should not be a fundamental property of all ports.
It should _only_ be a property of transcoding ports.  For example,
SRFI-6 string ports should not have any encoding, nor should
user-defined "soft" character ports.  Transcoding ports should also
support mixed byte/character-based I/O.

If users want to do mixed-I/O on a string port, they can convert it to a
bytevector in the desired character encoding and then create a
transcoding bytevector port.

One more recommendation for efficient implementation: internal routines
should always work with _blocks_ of bytes or characters, rather than
individual bytes or characters.  Our current "soft" ports are hopelessly
efficient for this reason, and they cannot be fixed without changing
their interface, so they should be deprecated and replaced.

Regarding the idea of moving some of this code to Scheme, it sounds like
a great idea when we have good native code generation on the important
upcoming platforms (x86/arm/mips at the very least), but until then I
think our ports code better stay in C, but with hooks to implement ports
in Scheme, analogous to our support for extensible numerics.

Nonetheless, I like the idea of using bytevectors for buffering in byte
ports, and offsets instead of pointers.

What do you think?

     Best,
      Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2012-04-10 22:11 ` Ludovic Courtès
@ 2013-01-17 14:40   ` Andy Wingo
  2013-01-18 21:27     ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Wingo @ 2013-01-17 14:40 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi,

Again, picking up old things:

On Wed 11 Apr 2012 00:11, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> Obviously we need ports implemented in C because of bootstrapping
>> concerns.  But can we give Scheme access to buffering and the underlying
>> fill (read) / drain (write) / wait (select) operations?
>>
>> So, the idea: refactor the port buffers (read, write, putback) to be
>> Scheme bytevectors, and internally store offsets instead of pointers.
>> Give access to some internal port primitives to a new (ice-9 ports)
>> module.
>>
>> I think we can manage to make (ice-9 ports) operate in both binary and
>> textual modes without a problem, just as we do with cports.  We'll have
>> to expose some iconv primitives to (ice-9 ports), but that's just as
>> well.  (Perhaps we should supply an (ice-9 iconv) module ?)
>
> I like the idea (more Scheme!).  However, it’s not clear to me what the
> performance impact would be with Guile’s current state.
>
> For instance, while ‘read’ remains in C, it can only suffer from such a
> change.  Conversely, things like ‘get-u8’ and ‘get-bytevector-n!’ may be
> faster.  OTOH, the equivalent of ‘get_utf8_codepoint’ is likely to be
> much slower.  And we still need to call out to C for ‘iconv’ and
> libunistring.

As a thought experiment, I don't see why things should have to slow
down.  Master has `scm_c_take_gc_bytevector', which can be used to wrap
the existing scm_t_port::write_buf, ::read_buf, and ::putback_buf
members.  At the cost of three allocations per port and three words per
allocation (bytevector tag, length, and pointer), we could give access
to these internal buffers to Scheme without affecting the C code at all.

We could go farther and allocate the buffers as bytevectors directly,
which would entail an additional indirection for C to get at the length
and data, but the length and data would all be contiguous anyway so in
practice I don't see it being too bad.

I'll see what I can do in a branch.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2013-01-17 14:40   ` Andy Wingo
@ 2013-01-18 21:27     ` Ludovic Courtès
  2013-01-20 20:21       ` Andy Wingo
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2013-01-18 21:27 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> As a thought experiment, I don't see why things should have to slow
> down.  Master has `scm_c_take_gc_bytevector', which can be used to wrap
> the existing scm_t_port::write_buf, ::read_buf, and ::putback_buf
> members.  At the cost of three allocations per port and three words per
> allocation (bytevector tag, length, and pointer), we could give access
> to these internal buffers to Scheme without affecting the C code at all.
>
> We could go farther and allocate the buffers as bytevectors directly,
> which would entail an additional indirection for C to get at the length
> and data, but the length and data would all be contiguous anyway so in
> practice I don't see it being too bad.

Yes, that seems doable.  What was the initial goal already?

Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2013-01-18 21:27     ` Ludovic Courtès
@ 2013-01-20 20:21       ` Andy Wingo
  2013-01-20 22:11         ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Wingo @ 2013-01-20 20:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Fri 18 Jan 2013 22:27, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> As a thought experiment, I don't see why things should have to slow
>> down.  Master has `scm_c_take_gc_bytevector', which can be used to wrap
>> the existing scm_t_port::write_buf, ::read_buf, and ::putback_buf
>> members.  At the cost of three allocations per port and three words per
>> allocation (bytevector tag, length, and pointer), we could give access
>> to these internal buffers to Scheme without affecting the C code at all.
>>
>> We could go farther and allocate the buffers as bytevectors directly,
>> which would entail an additional indirection for C to get at the length
>> and data, but the length and data would all be contiguous anyway so in
>> practice I don't see it being too bad.
>
> Yes, that seems doable.  What was the initial goal already?

The goal was to be able to provide Scheme functions that operate on
ports, but that can suspend the operation via an abort-to-prompt if it
will block.  This can only be done if we are not recursing through C.
Exposing the fundamental buffers lets port operations to be implemented
in Scheme (while also allowing the C implementation).

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thoughts on ports
  2013-01-20 20:21       ` Andy Wingo
@ 2013-01-20 22:11         ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2013-01-20 22:11 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> The goal was to be able to provide Scheme functions that operate on
> ports, but that can suspend the operation via an abort-to-prompt if it
> will block.  This can only be done if we are not recursing through C.
> Exposing the fundamental buffers lets port operations to be implemented
> in Scheme (while also allowing the C implementation).

OK, I see, and it makes sense to me.

Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-01-20 22:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-08 20:21 thoughts on ports Andy Wingo
2012-04-09 19:15 ` Mike Gran
2012-04-09 20:21   ` Noah Lavine
2012-04-10 22:11 ` Ludovic Courtès
2013-01-17 14:40   ` Andy Wingo
2013-01-18 21:27     ` Ludovic Courtès
2013-01-20 20:21       ` Andy Wingo
2013-01-20 22:11         ` Ludovic Courtès
2012-04-11 18:36 ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).