unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Reading data from a file descriptor
@ 2015-11-07 14:52 Jan Synáček
  2015-11-07 15:16 ` Artyom Poptsov
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Synáček @ 2015-11-07 14:52 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

Hello Guilers,

how do I read data from a file descriptor? I already have an fd required
from elsewhere that I need to read data from and actually have no idea how
to do that. I read through the documentation on ports, but that didn't
help. The fd actually points to a socket.

In C, I have something like this:

    const size_t bufsize = 4096;
    char buf[bufsize+1];
    ssize_t count;
    int fd;

    fd = require_valid_fd()

    count = read(fd, buf, bufsize);
    buf[count] = '\0';
    printf("read: %s\n", buf);

Cheers,
-- 
Jan Synáček

[-- Attachment #2: Type: text/html, Size: 803 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-07 14:52 Reading data from a file descriptor Jan Synáček
@ 2015-11-07 15:16 ` Artyom Poptsov
  2015-11-07 15:29   ` Artyom Poptsov
  0 siblings, 1 reply; 22+ messages in thread
From: Artyom Poptsov @ 2015-11-07 15:16 UTC (permalink / raw)
  To: Jan Synáček; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

Hello Jan,

do you need to read the data from the file descriptor in a Scheme
program?  If so, I guess you can make a wrapper procedure [1] that uses
'scm_fdes_to_port' from 'libguile.h' to convert your file descriptor to
a SCM port and return the port into the Scheme world.

Here's the description of 'scm_fdes_to_port' from GNU Guile 2.0.9
sources:

--8<---------------cut here---------------start------------->8---
/* Build a Scheme port from an open file descriptor `fdes'.
   MODE indicates whether FILE is open for reading or writing; it uses
      the same notation as open-file's second argument.
   NAME is a string to be used as the port's filename.
*/
SCM
scm_i_fdes_to_port (int fdes, long mode_bits, SCM name)
--8<---------------cut here---------------end--------------->8---

Hope this helps,

- Artyom

[1] See '(guile) C Extensions' chapter from the Guile manual.

-- 
Artyom V. Poptsov <poptsov.artyom@gmail.com>;  GPG Key: 0898A02F
Home page: http://poptsov-artyom.narod.ru/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-07 15:16 ` Artyom Poptsov
@ 2015-11-07 15:29   ` Artyom Poptsov
  2015-11-07 23:49     ` Andreas Rottmann
  0 siblings, 1 reply; 22+ messages in thread
From: Artyom Poptsov @ 2015-11-07 15:29 UTC (permalink / raw)
  To: Jan Synáček; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

Oh sorry, there was a mistake in my previous mail.  'scm_i_fdes_to_port'
is an internal procedure, but 'scm_fdes_to_port' is defined as Guile API
and should be available to Guile programs.  Here's its definition:

--8<---------------cut here---------------start------------->8---
SCM
scm_fdes_to_port (int fdes, char *mode, SCM name)
--8<---------------cut here---------------end--------------->8---

And the commentary that I quoted really applies to 'scm_fdes_to_port',
not to 'scm_i_fdes_to_port'.

- Artyom

-- 
Artyom V. Poptsov <poptsov.artyom@gmail.com>;  GPG Key: 0898A02F
Home page: http://poptsov-artyom.narod.ru/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-07 15:29   ` Artyom Poptsov
@ 2015-11-07 23:49     ` Andreas Rottmann
  2015-11-09  7:25       ` Jan Synáček
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Rottmann @ 2015-11-07 23:49 UTC (permalink / raw)
  To: Artyom Poptsov; +Cc: guile-devel

Artyom Poptsov <poptsov.artyom@gmail.com> writes:

> Oh sorry, there was a mistake in my previous mail.  'scm_i_fdes_to_port'
> is an internal procedure, but 'scm_fdes_to_port' is defined as Guile API
> and should be available to Guile programs.  Here's its definition:
>
> SCM
> scm_fdes_to_port (int fdes, char *mode, SCM name)
>
> And the commentary that I quoted really applies to 'scm_fdes_to_port',
> not to 'scm_i_fdes_to_port'.
>
Also note that if there's no requirement to actually implement this in
C, there's `fdes->inport' and `fdes->outport' on the Scheme level, so
something like the following would be analogous to the C example code
posted:

    (import (ice-9 binary-ports))
    
    (define (process-fd fd)
      (let ((port (fdes->inport fd)))
        (display "read: ")
        (display (get-bytevector-n port 100))
        (display "\n")))

    (process-fd (acquire-valid-fd))

You could now just implement `acquire-valid-fd' in C and expose it to
Scheme, if that is even necessary. If you have the FD available via
e.g. an environment variable, `acquire-valid-fd' can be implemented in
Scheme as well.

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.xx.vu/>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-07 23:49     ` Andreas Rottmann
@ 2015-11-09  7:25       ` Jan Synáček
  2015-11-13 15:51         ` Mark H Weaver
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Synáček @ 2015-11-09  7:25 UTC (permalink / raw)
  To: Andreas Rottmann; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at> wrote:

> Artyom Poptsov <poptsov.artyom@gmail.com> writes:
>
> > Oh sorry, there was a mistake in my previous mail.  'scm_i_fdes_to_port'
> > is an internal procedure, but 'scm_fdes_to_port' is defined as Guile API
> > and should be available to Guile programs.  Here's its definition:
> >
> > SCM
> > scm_fdes_to_port (int fdes, char *mode, SCM name)
> >
> > And the commentary that I quoted really applies to 'scm_fdes_to_port',
> > not to 'scm_i_fdes_to_port'.
> >
> Also note that if there's no requirement to actually implement this in
> C, there's `fdes->inport' and `fdes->outport' on the Scheme level, so
> something like the following would be analogous to the C example code
> posted:
>
>     (import (ice-9 binary-ports))
>
>     (define (process-fd fd)
>       (let ((port (fdes->inport fd)))
>         (display "read: ")
>         (display (get-bytevector-n port 100))
>         (display "\n")))
>
>     (process-fd (acquire-valid-fd))
>

This is something very similar that I ended up with. Just instead of
get-byte-vector, I used read-string!/partial.


> You could now just implement `acquire-valid-fd' in C and expose it to
> Scheme, if that is even necessary. If you have the FD available via
> e.g. an environment variable, `acquire-valid-fd' can be implemented in
> Scheme as well.
>
> Regards, Rotty
> --
> Andreas Rottmann -- <http://rotty.xx.vu/>
>

Thank you for all the responses!
-- 
Jan Synáček

[-- Attachment #2: Type: text/html, Size: 2501 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-09  7:25       ` Jan Synáček
@ 2015-11-13 15:51         ` Mark H Weaver
  2015-11-13 20:41           ` Jan Synáček
  2015-11-16 13:02           ` tomas
  0 siblings, 2 replies; 22+ messages in thread
From: Mark H Weaver @ 2015-11-13 15:51 UTC (permalink / raw)
  To: Jan Synáček; +Cc: Andreas Rottmann, guile-devel

Jan Synáček <jan.synacek@gmail.com> writes:

> On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at>
> wrote:
>
>     Also note that if there's no requirement to actually implement
>     this in
>     C, there's `fdes->inport' and `fdes->outport' on the Scheme level,
>     so
>     something like the following would be analogous to the C example
>     code
>     posted:
>     
>     (import (ice-9 binary-ports))
>     
>     (define (process-fd fd)
>     (let ((port (fdes->inport fd)))
>     (display "read: ")
>     (display (get-bytevector-n port 100))
>     (display "\n")))
>     
>     (process-fd (acquire-valid-fd))
>     
>
> This is something very similar that I ended up with. Just instead of
> get-byte-vector, I used read-string!/partial.

I would advise against using 'read-string!/partial' or any of the
procedures in (ice-9 rw).  This is a vestigial module from Guile 1.8
when strings were arrays of bytes, which they no longer are.  We should
probably mark them as deprecated.

For one thing, when we switch to using UTF-8 as the internal string
encoding, it will not be possible to keep 'read-string!/partial'
efficient.  It will necessarily have to do an encoding conversion.

In Guile 2+, I would advise using byte vectors when working with binary
data.  Portions of these can be converted to strings with a given
encoding if desired.  I might be able to give better advice if I knew
more about what you are doing here.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-13 15:51         ` Mark H Weaver
@ 2015-11-13 20:41           ` Jan Synáček
  2015-11-13 20:45             ` Thompson, David
  2015-11-16 10:54             ` Amirouche Boubekki
  2015-11-16 13:02           ` tomas
  1 sibling, 2 replies; 22+ messages in thread
From: Jan Synáček @ 2015-11-13 20:41 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

On Fri, Nov 13, 2015 at 4:51 PM, Mark H Weaver <mhw@netris.org> wrote:
>
> Jan Synáček <jan.synacek@gmail.com> writes:
>
> > On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at>
> > wrote:
> >
> >     Also note that if there's no requirement to actually implement
> >     this in
> >     C, there's `fdes->inport' and `fdes->outport' on the Scheme level,
> >     so
> >     something like the following would be analogous to the C example
> >     code
> >     posted:
> >
> >     (import (ice-9 binary-ports))
> >
> >     (define (process-fd fd)
> >     (let ((port (fdes->inport fd)))
> >     (display "read: ")
> >     (display (get-bytevector-n port 100))
> >     (display "\n")))
> >
> >     (process-fd (acquire-valid-fd))
> >
> >
> > This is something very similar that I ended up with. Just instead of
> > get-byte-vector, I used read-string!/partial.
>
> I would advise against using 'read-string!/partial' or any of the
> procedures in (ice-9 rw).  This is a vestigial module from Guile 1.8
> when strings were arrays of bytes, which they no longer are.  We should
> probably mark them as deprecated.
>
> For one thing, when we switch to using UTF-8 as the internal string
> encoding, it will not be possible to keep 'read-string!/partial'
> efficient.  It will necessarily have to do an encoding conversion.
>
> In Guile 2+, I would advise using byte vectors when working with binary
> data.  Portions of these can be converted to strings with a given
> encoding if desired.  I might be able to give better advice if I knew
> more about what you are doing here.
>
>     Regards,
>       Mark


I have an open fd to a unix socket and I want to read data from it. I
know that the data is going to be only strings, but I don't know the
length in advance. The good thing about using read-string!/partial is,
that I don't have to specify how many bytes I want to read and it does
the right thing. If you point me to a better direction, I'll be
grateful. I came up with:

(for-each (lambda (fd)
            (let* ((buf (make-string 4096)))
              (read-string!/partial buf (fdes->inport fd))
              (format #t "fd[~a]: ~a" fd buf) (newline)))
          fds)

-- 
Jan Synáček



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-13 20:41           ` Jan Synáček
@ 2015-11-13 20:45             ` Thompson, David
  2015-11-15 11:09               ` Jan Synáček
  2015-11-16 10:54             ` Amirouche Boubekki
  1 sibling, 1 reply; 22+ messages in thread
From: Thompson, David @ 2015-11-13 20:45 UTC (permalink / raw)
  To: Jan Synáček; +Cc: Mark H Weaver, guile-devel

On Fri, Nov 13, 2015 at 3:41 PM, Jan Synáček <jan.synacek@gmail.com> wrote:
>
> I have an open fd to a unix socket and I want to read data from it. I
> know that the data is going to be only strings, but I don't know the
> length in advance. The good thing about using read-string!/partial is,
> that I don't have to specify how many bytes I want to read and it does
> the right thing. If you point me to a better direction, I'll be
> grateful. I came up with:
>
> (for-each (lambda (fd)
>             (let* ((buf (make-string 4096)))
>               (read-string!/partial buf (fdes->inport fd))
>               (format #t "fd[~a]: ~a" fd buf) (newline)))
>           fds)
>

Maybe 'read-string' in (ice-9 rdelim) is what you're after.

- Dave



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-13 20:45             ` Thompson, David
@ 2015-11-15 11:09               ` Jan Synáček
  2015-11-15 12:05                 ` Thompson, David
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Synáček @ 2015-11-15 11:09 UTC (permalink / raw)
  To: Thompson, David; +Cc: guile-devel

On Fri, Nov 13, 2015 at 9:45 PM, Thompson, David
<dthompson2@worcester.edu> wrote:
> On Fri, Nov 13, 2015 at 3:41 PM, Jan Synáček <jan.synacek@gmail.com> wrote:
>>
>> I have an open fd to a unix socket and I want to read data from it. I
>> know that the data is going to be only strings, but I don't know the
>> length in advance. The good thing about using read-string!/partial is,
>> that I don't have to specify how many bytes I want to read and it does
>> the right thing. If you point me to a better direction, I'll be
>> grateful. I came up with:
>>
>> (for-each (lambda (fd)
>>             (let* ((buf (make-string 4096)))
>>               (read-string!/partial buf (fdes->inport fd))
>>               (format #t "fd[~a]: ~a" fd buf) (newline)))
>>           fds)
>>
>
> Maybe 'read-string' in (ice-9 rdelim) is what you're after.
>
> - Dave

For some reason, 'read-string' blocks when I don't specify a small enough limit.

-- 
Jan Synáček



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-15 11:09               ` Jan Synáček
@ 2015-11-15 12:05                 ` Thompson, David
  0 siblings, 0 replies; 22+ messages in thread
From: Thompson, David @ 2015-11-15 12:05 UTC (permalink / raw)
  To: Jan Synáček; +Cc: guile-devel

On Sun, Nov 15, 2015 at 6:09 AM, Jan Synáček <jan.synacek@gmail.com> wrote:
> On Fri, Nov 13, 2015 at 9:45 PM, Thompson, David
> <dthompson2@worcester.edu> wrote:
>> On Fri, Nov 13, 2015 at 3:41 PM, Jan Synáček <jan.synacek@gmail.com> wrote:
>>>
>>> I have an open fd to a unix socket and I want to read data from it. I
>>> know that the data is going to be only strings, but I don't know the
>>> length in advance. The good thing about using read-string!/partial is,
>>> that I don't have to specify how many bytes I want to read and it does
>>> the right thing. If you point me to a better direction, I'll be
>>> grateful. I came up with:
>>>
>>> (for-each (lambda (fd)
>>>             (let* ((buf (make-string 4096)))
>>>               (read-string!/partial buf (fdes->inport fd))
>>>               (format #t "fd[~a]: ~a" fd buf) (newline)))
>>>           fds)
>>>
>>
>> Maybe 'read-string' in (ice-9 rdelim) is what you're after.
>>
>> - Dave
>
> For some reason, 'read-string' blocks when I don't specify a small enough limit.

That's how I/O operations work.  'read-string' blocks until the end of
file.  Guess I misunderstood what you're after.

- Dave



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-13 20:41           ` Jan Synáček
  2015-11-13 20:45             ` Thompson, David
@ 2015-11-16 10:54             ` Amirouche Boubekki
  2015-11-17  9:53               ` tomas
  1 sibling, 1 reply; 22+ messages in thread
From: Amirouche Boubekki @ 2015-11-16 10:54 UTC (permalink / raw)
  To: Jan Synáček
  Cc: Mark H Weaver, guile-devel-bounces+amirouche+dev=hypermove.net,
	guile-devel

On 2015-11-13 21:41, Jan Synáček wrote:
> On Fri, Nov 13, 2015 at 4:51 PM, Mark H Weaver <mhw@netris.org> wrote:
>> 
>> Jan Synáček <jan.synacek@gmail.com> writes:
>> 
>> > On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at>
>> > wrote:
>> >
>> >     Also note that if there's no requirement to actually implement
>> >     this in
>> >     C, there's `fdes->inport' and `fdes->outport' on the Scheme level,
>> >     so
>> >     something like the following would be analogous to the C example
>> >     code
>> >     posted:
>> >
>> >     (import (ice-9 binary-ports))
>> >
>> >     (define (process-fd fd)
>> >     (let ((port (fdes->inport fd)))
>> >     (display "read: ")
>> >     (display (get-bytevector-n port 100))
>> >     (display "\n")))
>> >
>> >     (process-fd (acquire-valid-fd))
>> >
>> >
>> > This is something very similar that I ended up with. Just instead of
>> > get-byte-vector, I used read-string!/partial.
>> 
>> I would advise against using 'read-string!/partial' or any of the
>> procedures in (ice-9 rw).  This is a vestigial module from Guile 1.8
>> when strings were arrays of bytes, which they no longer are.  We 
>> should
>> probably mark them as deprecated.
>> 
>> For one thing, when we switch to using UTF-8 as the internal string
>> encoding, it will not be possible to keep 'read-string!/partial'
>> efficient.  It will necessarily have to do an encoding conversion.
>> 
>> In Guile 2+, I would advise using byte vectors when working with 
>> binary
>> data.  Portions of these can be converted to strings with a given
>> encoding if desired.  I might be able to give better advice if I knew
>> more about what you are doing here.
>> 
>>     Regards,
>>       Mark
> 
> 
> I have an open fd to a unix socket and I want to read data from it. I
> know that the data is going to be only strings, but I don't know the
> length in advance.

Do you know a delimiter? maybe it's the null char?

TCP is stream oriented, it's not structured at this layer into messages
or segments. You need some knowledge about the byte stream to be able to
split it into different meaningful piece for the upper layer.

In Python the socket.recv method returns bytestring but still you have 
to parse
to bytestring to make sure the delimiter is not in the middle of the 
string. What
I mean is that in theory you might socket.recv the end of an application 
level message
and the beginning of another using the same call.

Otherwise said, the only thing that could make sens for a (recv) 
procedure is
to return the full content of the inbound network buffer for the given 
socket
as a bytevector but it will still require work to to make things work...


>  The good thing about using read-string!/partial is,
> that I don't have to specify how many bytes I want to read and it does
> the right thing. If you point me to a better direction, I'll be
> grateful. I came up with:
> 
> (for-each (lambda (fd)
>             (let* ((buf (make-string 4096)))
>               (read-string!/partial buf (fdes->inport fd))
>               (format #t "fd[~a]: ~a" fd buf) (newline)))
>           fds)

Have a look at the implementation. IMO the solution is to build a loop 
with
(read-char) [1] looking for the *end char* to stop the loop.

[1] 
https://www.gnu.org/software/guile/manual/html_node/Reading.html#index-read_002dchar



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-13 15:51         ` Mark H Weaver
  2015-11-13 20:41           ` Jan Synáček
@ 2015-11-16 13:02           ` tomas
  2015-11-23 21:07             ` Andreas Rottmann
  1 sibling, 1 reply; 22+ messages in thread
From: tomas @ 2015-11-16 13:02 UTC (permalink / raw)
  To: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Nov 13, 2015 at 10:51:58AM -0500, Mark H Weaver wrote:
> Jan Synáček <jan.synacek@gmail.com> writes:
> 
> > On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at>
> > wrote:
> >
> >     Also note that if there's no requirement to actually implement
> >     this in
> >     C, there's `fdes->inport' and `fdes->outport' on the Scheme level,
> >     so
> >     something like the following would be analogous to the C example
> >     code
> >     posted:
> >     
> >     (import (ice-9 binary-ports))
> >     
> >     (define (process-fd fd)
> >     (let ((port (fdes->inport fd)))
> >     (display "read: ")
> >     (display (get-bytevector-n port 100))
> >     (display "\n")))
> >     
> >     (process-fd (acquire-valid-fd))
> >     
> >
> > This is something very similar that I ended up with. Just instead of
> > get-byte-vector, I used read-string!/partial.
> 
> I would advise against using 'read-string!/partial' or any of the
> procedures in (ice-9 rw).  This is a vestigial module from Guile 1.8
> when strings were arrays of bytes, which they no longer are.  We should
> probably mark them as deprecated.
> 
> For one thing, when we switch to using UTF-8 as the internal string
> encoding, it will not be possible to keep 'read-string!/partial'
> efficient.  It will necessarily have to do an encoding conversion.
> 
> In Guile 2+, I would advise using byte vectors when working with binary
> data.  Portions of these can be converted to strings with a given
> encoding if desired.  I might be able to give better advice if I knew
> more about what you are doing here.

Mark,

what Jan is after (and what I'd like to have too) is something
akin to Unix read(2) with O_NONBLOCK: provide a buffer, request
(up to) N bytes from the file (descriptor) and get an answer
(with possibly less bytes).

I tried that a while ago and was surprised that I had to resort
to (character) strings, with all the downsides you mention. Something
like that for byte vectors would be awesome. Either it exists (and
neither Jan nor me have succeeded in finding it) or it doesn't.

I'll go have a look later to see whether my recollection is accurate.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlZJ0/kACgkQBcgs9XrR2kb93QCdF1K/IT/Ma+CkOFAqQgul9px/
tDMAnRoki8ODfD3tqwZSyL1GArAfDnpn
=AuBJ
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-16 10:54             ` Amirouche Boubekki
@ 2015-11-17  9:53               ` tomas
  2015-11-17 12:59                 ` Chris Vine
  0 siblings, 1 reply; 22+ messages in thread
From: tomas @ 2015-11-17  9:53 UTC (permalink / raw)
  To: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Nov 16, 2015 at 11:54:33AM +0100, Amirouche Boubekki wrote:
> On 2015-11-13 21:41, Jan Synáček wrote:

[...]

> >I have an open fd to a unix socket and I want to read data from it. I
> >know that the data is going to be only strings, but I don't know the
> >length in advance.
> 
> Do you know a delimiter? maybe it's the null char?
> 
> TCP is stream oriented, it's not structured at this layer into messages
> or segments. You need some knowledge about the byte stream to be able to
> split it into different meaningful piece for the upper layer.

I think I "got" Jan's request, because I've been in a similar
situation before: delimiter is not (yet) part of it. What he's
looking for is an interface à la read(2), meaning "gimme as much
as there is in the queue, up to N bytes, and tell me how much
you gave me". Of course, putting stuff in a byte vector would
be preferable; the only functions I've seen[1] which "do" that
interface are read-string!/partial and write-string/partial
operate on strings, not byte arrays, alas.

> In Python the socket.recv method returns bytestring but still you
> have to parse
> to bytestring to make sure the delimiter is not in the middle of the
> string. What
> I mean is that in theory you might socket.recv the end of an
> application level message
> and the beginning of another using the same call.

Not (yet) about delimiters.

> Otherwise said, the only thing that could make sens for a (recv)
> procedure is
> to return the full content of the inbound network buffer for the
> given socket
> as a bytevector but it will still require work to to make things work...

Exactly, see above. A delimiter, or just feed the thing to an
incremental scanner/parser (e.g. a finite state machine, which
can get its input spoonwise).

> Have a look at the implementation. IMO the solution is to build a
> loop with
> (read-char) [1] looking for the *end char* to stop the loop.

If your application is set up to accept partial buffers of arbitraty
length, it seems a bit wasteful to have a buffered read (which is
definitely beneath `read-char') going through a character-wise
read into a new buffered read...

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlZK+Q8ACgkQBcgs9XrR2kaV6QCdFkPPiAZqhyC4knvJroGyq2+m
I0QAnicg8Cz5VL2I9VfWm7GcMLNhvNsM
=dk4d
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17 12:59                 ` Chris Vine
@ 2015-11-17 12:52                   ` tomas
  2015-11-17 13:55                     ` Chris Vine
  2015-11-18  8:28                   ` Jan Synáček
  1 sibling, 1 reply; 22+ messages in thread
From: tomas @ 2015-11-17 12:52 UTC (permalink / raw)
  To: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Nov 17, 2015 at 12:59:56PM +0000, Chris Vine wrote:
> On Tue, 17 Nov 2015 10:53:19 +0100

[...]

> guile's R6RS implementation has get-bytevector-some, which will do that
> for you, with unix-read-like behaviour.

Thank you a thousand. You made me happy :-)

> You cannot use this for UTF-8 text by trying to convert the bytevector
> with utf8->string, because you could have received a partially formed
> utf-8 character.

I understand that. If you want to do that you need an incremental
converter, like the one provided by iconv(3).

>                  So for text, you should use line orientated reading,
> such as with ice-9 read-line or R6RS get-line.

Not necessarily, see above. Perhaps just sub-streams of your input are
UTF-8, or whatever; but yes, the parsing responsibility is then with
the user, by whatever means.

Thank you again. I was looking for that function all in the wrong
places :-/

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlZLIwUACgkQBcgs9XrR2kZ8IQCfUDejBFEWgnDcZrEESv0pT0uT
JnkAnRdDUpD2uwIlObNlTRXChENtdZHB
=s1WT
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17  9:53               ` tomas
@ 2015-11-17 12:59                 ` Chris Vine
  2015-11-17 12:52                   ` tomas
  2015-11-18  8:28                   ` Jan Synáček
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Vine @ 2015-11-17 12:59 UTC (permalink / raw)
  To: guile-devel

On Tue, 17 Nov 2015 10:53:19 +0100
<tomas@tuxteam.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Mon, Nov 16, 2015 at 11:54:33AM +0100, Amirouche Boubekki wrote:
> > On 2015-11-13 21:41, Jan Synáček wrote:  
> 
> [...]
> 
> > >I have an open fd to a unix socket and I want to read data from
> > >it. I know that the data is going to be only strings, but I don't
> > >know the length in advance.  
> > 
> > Do you know a delimiter? maybe it's the null char?
> > 
> > TCP is stream oriented, it's not structured at this layer into
> > messages or segments. You need some knowledge about the byte stream
> > to be able to split it into different meaningful piece for the
> > upper layer.  
> 
> I think I "got" Jan's request, because I've been in a similar
> situation before: delimiter is not (yet) part of it. What he's
> looking for is an interface à la read(2), meaning "gimme as much
> as there is in the queue, up to N bytes, and tell me how much
> you gave me". Of course, putting stuff in a byte vector would
> be preferable; the only functions I've seen[1] which "do" that
> interface are read-string!/partial and write-string/partial
> operate on strings, not byte arrays, alas.

guile's R6RS implementation has get-bytevector-some, which will do that
for you, with unix-read-like behaviour.

You cannot use this for UTF-8 text by trying to convert the bytevector
with utf8->string, because you could have received a partially formed
utf-8 character.  So for text, you should use line orientated reading,
such as with ice-9 read-line or R6RS get-line.

Chris



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17 13:55                     ` Chris Vine
@ 2015-11-17 13:33                       ` tomas
  2016-06-20 10:40                       ` Andy Wingo
  1 sibling, 0 replies; 22+ messages in thread
From: tomas @ 2015-11-17 13:33 UTC (permalink / raw)
  To: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Nov 17, 2015 at 01:55:17PM +0000, Chris Vine wrote:
> On Tue, 17 Nov 2015 13:52:21 +0100
> <tomas@tuxteam.de> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > On Tue, Nov 17, 2015 at 12:59:56PM +0000, Chris Vine wrote:
> > > On Tue, 17 Nov 2015 10:53:19 +0100  
> > 
> > [...]
> > 
> > > guile's R6RS implementation has get-bytevector-some, which will do
> > > that for you, with unix-read-like behaviour.  
> > 
> > Thank you a thousand. You made me happy :-)
> 
> I suppose it is worth adding that it might not be optimally efficient
> for all uses, as there is no get-bytevector-some! procedure which
> modifies an existing bytevector and takes a maximum length value.  I
> guess it is a matter of 'suck it and see', efficiency-wise.
> 
> If you are sending/receiving binary packets, it might be better to make
> them of fixed size and use get-bytevector-n!.  (Unfortunately,
> get-bytevector-n! does block until n is fulfilled according to R6RS:
> "The get-bytevector-n! procedure reads from binary-input-port, blocking
> as necessary, until count bytes are available from binary-input-port or
> until an end of file is reached".)

:-(

As I noted before, it's a while since I attempted that. I was looking
for an equivalent of read(2) and write(2): simple, efficient, easy to
understand semantics (if you discount the EOF problem for now).

Perhaps the limitations you mention above steered me towards
read-string!/partial and friends, then.

Thanks & regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlZLLJcACgkQBcgs9XrR2kbS9gCeNM696u8KT9Fzq0fSifH8YKa3
VjEAn0KKx5Im4UNxUumiy0RroiKT3iDU
=nAXY
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17 12:52                   ` tomas
@ 2015-11-17 13:55                     ` Chris Vine
  2015-11-17 13:33                       ` tomas
  2016-06-20 10:40                       ` Andy Wingo
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Vine @ 2015-11-17 13:55 UTC (permalink / raw)
  To: guile-devel

On Tue, 17 Nov 2015 13:52:21 +0100
<tomas@tuxteam.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Tue, Nov 17, 2015 at 12:59:56PM +0000, Chris Vine wrote:
> > On Tue, 17 Nov 2015 10:53:19 +0100  
> 
> [...]
> 
> > guile's R6RS implementation has get-bytevector-some, which will do
> > that for you, with unix-read-like behaviour.  
> 
> Thank you a thousand. You made me happy :-)

I suppose it is worth adding that it might not be optimally efficient
for all uses, as there is no get-bytevector-some! procedure which
modifies an existing bytevector and takes a maximum length value.  I
guess it is a matter of 'suck it and see', efficiency-wise.

If you are sending/receiving binary packets, it might be better to make
them of fixed size and use get-bytevector-n!.  (Unfortunately,
get-bytevector-n! does block until n is fulfilled according to R6RS:
"The get-bytevector-n! procedure reads from binary-input-port, blocking
as necessary, until count bytes are available from binary-input-port or
until an end of file is reached".)

Chris



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17 12:59                 ` Chris Vine
  2015-11-17 12:52                   ` tomas
@ 2015-11-18  8:28                   ` Jan Synáček
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Synáček @ 2015-11-18  8:28 UTC (permalink / raw)
  To: Chris Vine; +Cc: guile-devel

On Tue, Nov 17, 2015 at 1:59 PM, Chris Vine <chris@cvine.freeserve.co.uk> wrote:
> On Tue, 17 Nov 2015 10:53:19 +0100
> <tomas@tuxteam.de> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Mon, Nov 16, 2015 at 11:54:33AM +0100, Amirouche Boubekki wrote:
>> > On 2015-11-13 21:41, Jan Synáček wrote:
>>
>> [...]
>>
>> > >I have an open fd to a unix socket and I want to read data from
>> > >it. I know that the data is going to be only strings, but I don't
>> > >know the length in advance.
>> >
>> > Do you know a delimiter? maybe it's the null char?
>> >
>> > TCP is stream oriented, it's not structured at this layer into
>> > messages or segments. You need some knowledge about the byte stream
>> > to be able to split it into different meaningful piece for the
>> > upper layer.
>>
>> I think I "got" Jan's request, because I've been in a similar
>> situation before: delimiter is not (yet) part of it. What he's
>> looking for is an interface à la read(2), meaning "gimme as much
>> as there is in the queue, up to N bytes, and tell me how much
>> you gave me". Of course, putting stuff in a byte vector would
>> be preferable; the only functions I've seen[1] which "do" that
>> interface are read-string!/partial and write-string/partial
>> operate on strings, not byte arrays, alas.
>
> guile's R6RS implementation has get-bytevector-some, which will do that
> for you, with unix-read-like behaviour.
>
> You cannot use this for UTF-8 text by trying to convert the bytevector
> with utf8->string, because you could have received a partially formed
> utf-8 character.  So for text, you should use line orientated reading,
> such as with ice-9 read-line or R6RS get-line.
>
> Chris

This seems to be exactly what I'm looking for, thank you!

-- 
Jan Synáček



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-16 13:02           ` tomas
@ 2015-11-23 21:07             ` Andreas Rottmann
  2015-11-24 15:28               ` tomas
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Rottmann @ 2015-11-23 21:07 UTC (permalink / raw)
  To: tomas; +Cc: guile-devel

<tomas@tuxteam.de> writes:

> On Fri, Nov 13, 2015 at 10:51:58AM -0500, Mark H Weaver wrote:
>> Jan Synáček <jan.synacek@gmail.com> writes:
>> 
>> > On Sun, Nov 8, 2015 at 12:49 AM, Andreas Rottmann <a.rottmann@gmx.at>
>> > wrote:
>> >
>> >     Also note that if there's no requirement to actually implement
>> >     this in
>> >     C, there's `fdes->inport' and `fdes->outport' on the Scheme level,
>> >     so
>> >     something like the following would be analogous to the C example
>> >     code
>> >     posted:
>> >     
>> >     (import (ice-9 binary-ports))
>> >     
>> >     (define (process-fd fd)
>> >     (let ((port (fdes->inport fd)))
>> >     (display "read: ")
>> >     (display (get-bytevector-n port 100))
>> >     (display "\n")))
>> >     
>> >     (process-fd (acquire-valid-fd))
>> >     
>> >
>> > This is something very similar that I ended up with. Just instead of
>> > get-byte-vector, I used read-string!/partial.
>> 
>> I would advise against using 'read-string!/partial' or any of the
>> procedures in (ice-9 rw).  This is a vestigial module from Guile 1.8
>> when strings were arrays of bytes, which they no longer are.  We should
>> probably mark them as deprecated.
>> 
>> For one thing, when we switch to using UTF-8 as the internal string
>> encoding, it will not be possible to keep 'read-string!/partial'
>> efficient.  It will necessarily have to do an encoding conversion.
>> 
>> In Guile 2+, I would advise using byte vectors when working with binary
>> data.  Portions of these can be converted to strings with a given
>> encoding if desired.  I might be able to give better advice if I knew
>> more about what you are doing here.
>
> Mark,
>
> what Jan is after (and what I'd like to have too) is something
> akin to Unix read(2) with O_NONBLOCK: provide a buffer, request
> (up to) N bytes from the file (descriptor) and get an answer
> (with possibly less bytes).
>
> I tried that a while ago and was surprised that I had to resort
> to (character) strings, with all the downsides you mention. Something
> like that for byte vectors would be awesome. Either it exists (and
> neither Jan nor me have succeeded in finding it) or it doesn't.
>
The procedure with the closest semantics is R6RS
`get-bytevector-some`. While the R6RS says it will block if no data is
available, a quick look at Guile source code seems to indicate that it
probably works with non-blocking I/O -- I'd say it should return EOF if
called on a non-readable, non-blocking port, and otherwise not block,
and return the data available. This is all just from a quick
inspection, without running any actual code.

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.xx.vu/>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-23 21:07             ` Andreas Rottmann
@ 2015-11-24 15:28               ` tomas
  0 siblings, 0 replies; 22+ messages in thread
From: tomas @ 2015-11-24 15:28 UTC (permalink / raw)
  Cc: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Nov 23, 2015 at 10:07:11PM +0100, Andreas Rottmann wrote:
> <tomas@tuxteam.de> writes:

[...]

> > what Jan is after (and what I'd like to have too) is something
> > akin to Unix read(2) with O_NONBLOCK: [...]

> The procedure with the closest semantics is R6RS
> `get-bytevector-some`. While the R6RS says it will block if no data is
> available, a quick look at Guile source code seems to indicate that it
> probably works with non-blocking I/O -- I'd say it should return EOF if
> called on a non-readable, non-blocking port, and otherwise not block,
> and return the data available. This is all just from a quick
> inspection, without running any actual code.

Thanks a bunch for looking into it. I'll give it a try and report back.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlZUgjsACgkQBcgs9XrR2kbScACdH6hoWVVX6m6oCk1O3Fq+S1Pn
EI4AnRvOO3QSBMq/GmU8Mzctm4VliTMe
=Nyl4
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2015-11-17 13:55                     ` Chris Vine
  2015-11-17 13:33                       ` tomas
@ 2016-06-20 10:40                       ` Andy Wingo
  2016-06-20 10:58                         ` tomas
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Wingo @ 2016-06-20 10:40 UTC (permalink / raw)
  To: Chris Vine; +Cc: guile-devel

On Tue 17 Nov 2015 14:55, Chris Vine <chris@cvine.freeserve.co.uk> writes:

> On Tue, 17 Nov 2015 13:52:21 +0100
>> On Tue, Nov 17, 2015 at 12:59:56PM +0000, Chris Vine wrote:
>> > On Tue, 17 Nov 2015 10:53:19 +0100  
>> 
>> [...]
>> 
>> > guile's R6RS implementation has get-bytevector-some, which will do
>> > that for you, with unix-read-like behaviour.  
>> 
>> Thank you a thousand. You made me happy :-)
>
> I suppose it is worth adding that it might not be optimally efficient
> for all uses, as there is no get-bytevector-some! procedure which
> modifies an existing bytevector and takes a maximum length value.  I
> guess it is a matter of 'suck it and see', efficiency-wise.

I would be happy to support such an interface though.  I guess it would
take a keyword or optional argument indicating a minimum number of bytes
to fill, and if that number is 0 it would never block; sound about
right?

Andy



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reading data from a file descriptor
  2016-06-20 10:40                       ` Andy Wingo
@ 2016-06-20 10:58                         ` tomas
  0 siblings, 0 replies; 22+ messages in thread
From: tomas @ 2016-06-20 10:58 UTC (permalink / raw)
  To: guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jun 20, 2016 at 12:40:53PM +0200, Andy Wingo wrote:
> On Tue 17 Nov 2015 14:55, Chris Vine <chris@cvine.freeserve.co.uk> writes:
> 
> > On Tue, 17 Nov 2015 13:52:21 +0100
> >> On Tue, Nov 17, 2015 at 12:59:56PM +0000, Chris Vine wrote:

[...]

> >> > guile's R6RS implementation has get-bytevector-some, which will do
> >> > that for you, with unix-read-like behaviour.  

[...]

> > I suppose it is worth adding that it might not be optimally efficient
> > for all uses, as there is no get-bytevector-some! procedure which
> > modifies an existing bytevector and takes a maximum length value.  I
> > guess it is a matter of 'suck it and see', efficiency-wise.
> 
> I would be happy to support such an interface though.  I guess it would
> take a keyword or optional argument indicating a minimum number of bytes
> to fill, and if that number is 0 it would never block; sound about
> right?

Assuming I understood everything involved -- yes, it sounds spot-on.

Thanks
- - t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAldnzF4ACgkQBcgs9XrR2kbbkQCeJtkFrSRYBz9UTX9+pO1v7E3r
hKYAn03UD5dhTe9hPpYusaBqTYOu2MbW
=WJhe
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-06-20 10:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-07 14:52 Reading data from a file descriptor Jan Synáček
2015-11-07 15:16 ` Artyom Poptsov
2015-11-07 15:29   ` Artyom Poptsov
2015-11-07 23:49     ` Andreas Rottmann
2015-11-09  7:25       ` Jan Synáček
2015-11-13 15:51         ` Mark H Weaver
2015-11-13 20:41           ` Jan Synáček
2015-11-13 20:45             ` Thompson, David
2015-11-15 11:09               ` Jan Synáček
2015-11-15 12:05                 ` Thompson, David
2015-11-16 10:54             ` Amirouche Boubekki
2015-11-17  9:53               ` tomas
2015-11-17 12:59                 ` Chris Vine
2015-11-17 12:52                   ` tomas
2015-11-17 13:55                     ` Chris Vine
2015-11-17 13:33                       ` tomas
2016-06-20 10:40                       ` Andy Wingo
2016-06-20 10:58                         ` tomas
2015-11-18  8:28                   ` Jan Synáček
2015-11-16 13:02           ` tomas
2015-11-23 21:07             ` Andreas Rottmann
2015-11-24 15:28               ` tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).