unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Panicz Maciej Godek <godek.maciek@gmail.com>
Cc: guile-user@gnu.org
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 13:29:55 -0500	[thread overview]
Message-ID: <877ga1umho.fsf@netris.org> (raw)
In-Reply-To: <CAMFYt2YhcRdUvQ3_zTXhgMZOQMMgyp2h-9grR2xLpgwAyvPU9g@mail.gmail.com> (Panicz Maciej Godek's message of "Wed, 15 Jan 2014 16:27:50 +0100")

Panicz Maciej Godek <godek.maciek@gmail.com> writes:

> Your solution seems reasonable, but I have found another way, which
> lead me to some new problems.
> I realised that since sockets are ports in guile, I could process them
> with the plain "read" (which is what I have been using them for
> anyway).
>
> However, this approach caused some new problems. The thing is that if
> I'm trying to read some message from port, and that message does not
> end with a delimiter (like a whitespace or a balancing, closing
> parenthesis), then the read would wait forever, possibly gluing its
> arguments.
>
> The solution I came up with is through soft ports. The idea is to have
> a port proxy, that -- if it would block -- would return an eof-object
> instead.

This is terribly inefficient, and also not robust.  Guile's native soft
ports do not support efficient reading, because everything is one
character at a time.  Also, Guile's 'char-ready?' currently does the job
of 'u8-ready?', i.e. it only checks if a _byte_ is available, not a
whole character, so the 'read-char' might still block.  Anyway, if this
is a socket, what if the data isn't available simply because of network
latency?  Then you'll generate a spurious EOF.


To offer my own answer to your original question: R7RS-small provides an
API that does precisely what you asked for.  Its 'utf8->string'
procedure accepts optional 'start' and 'end' byte positions.  I
implemented this on the 'r7rs-wip' branch of Guile git as follows:

http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=module/scheme/base.scm;h=f110d4c2b241ec0941b4223cece05c309db5308a;hb=r7rs-wip#l327

  (import (rename (rnrs bytevectors)
                  (utf8->string      r6rs-utf8->string)
                  (string->utf8      r6rs-string->utf8)
                  (bytevector-copy   r6rs-bytevector-copy)
                  (bytevector-copy!  r6rs-bytevector-copy!)))

  [...]

  (define bytevector-copy
    (case-lambda
      ((bv)
       (r6rs-bytevector-copy bv))
      ((bv start)
       (let* ((len (- (bytevector-length bv) start))
              (result (make-bytevector len)))
         (r6rs-bytevector-copy! bv start result 0 len)
         result))
      ((bv start end)
       (let* ((len (- end start))
              (result (make-bytevector len)))
         (r6rs-bytevector-copy! bv start result 0 len)
         result))))

  (define utf8->string
    (case-lambda
      ((bv) (r6rs-utf8->string bv))
      ((bv start)
       (r6rs-utf8->string (bytevector-copy bv start)))
      ((bv start end)
       (r6rs-utf8->string (bytevector-copy bv start end)))))



      reply	other threads:[~2014-01-15 18:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-13 23:17 Converting a part of byte vector to UTF-8 string Panicz Maciej Godek
2014-01-15  4:59 ` Nala Ginrut
2014-01-15 15:27   ` Panicz Maciej Godek
2014-01-15 18:29     ` Mark H Weaver [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877ga1umho.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=godek.maciek@gmail.com \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).