From: Mark H Weaver <mhw@netris.org>
To: Panicz Maciej Godek <godek.maciek@gmail.com>
Cc: guile-user@gnu.org
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 13:29:55 -0500 [thread overview]
Message-ID: <877ga1umho.fsf@netris.org> (raw)
In-Reply-To: <CAMFYt2YhcRdUvQ3_zTXhgMZOQMMgyp2h-9grR2xLpgwAyvPU9g@mail.gmail.com> (Panicz Maciej Godek's message of "Wed, 15 Jan 2014 16:27:50 +0100")
Panicz Maciej Godek <godek.maciek@gmail.com> writes:
> Your solution seems reasonable, but I have found another way, which
> lead me to some new problems.
> I realised that since sockets are ports in guile, I could process them
> with the plain "read" (which is what I have been using them for
> anyway).
>
> However, this approach caused some new problems. The thing is that if
> I'm trying to read some message from port, and that message does not
> end with a delimiter (like a whitespace or a balancing, closing
> parenthesis), then the read would wait forever, possibly gluing its
> arguments.
>
> The solution I came up with is through soft ports. The idea is to have
> a port proxy, that -- if it would block -- would return an eof-object
> instead.
This is terribly inefficient, and also not robust. Guile's native soft
ports do not support efficient reading, because everything is one
character at a time. Also, Guile's 'char-ready?' currently does the job
of 'u8-ready?', i.e. it only checks if a _byte_ is available, not a
whole character, so the 'read-char' might still block. Anyway, if this
is a socket, what if the data isn't available simply because of network
latency? Then you'll generate a spurious EOF.
To offer my own answer to your original question: R7RS-small provides an
API that does precisely what you asked for. Its 'utf8->string'
procedure accepts optional 'start' and 'end' byte positions. I
implemented this on the 'r7rs-wip' branch of Guile git as follows:
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=module/scheme/base.scm;h=f110d4c2b241ec0941b4223cece05c309db5308a;hb=r7rs-wip#l327
(import (rename (rnrs bytevectors)
(utf8->string r6rs-utf8->string)
(string->utf8 r6rs-string->utf8)
(bytevector-copy r6rs-bytevector-copy)
(bytevector-copy! r6rs-bytevector-copy!)))
[...]
(define bytevector-copy
(case-lambda
((bv)
(r6rs-bytevector-copy bv))
((bv start)
(let* ((len (- (bytevector-length bv) start))
(result (make-bytevector len)))
(r6rs-bytevector-copy! bv start result 0 len)
result))
((bv start end)
(let* ((len (- end start))
(result (make-bytevector len)))
(r6rs-bytevector-copy! bv start result 0 len)
result))))
(define utf8->string
(case-lambda
((bv) (r6rs-utf8->string bv))
((bv start)
(r6rs-utf8->string (bytevector-copy bv start)))
((bv start end)
(r6rs-utf8->string (bytevector-copy bv start end)))))
prev parent reply other threads:[~2014-01-15 18:29 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-13 23:17 Converting a part of byte vector to UTF-8 string Panicz Maciej Godek
2014-01-15 4:59 ` Nala Ginrut
2014-01-15 15:27 ` Panicz Maciej Godek
2014-01-15 18:29 ` Mark H Weaver [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877ga1umho.fsf@netris.org \
--to=mhw@netris.org \
--cc=godek.maciek@gmail.com \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).