From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.user Subject: Re: Converting a part of byte vector to UTF-8 string Date: Wed, 15 Jan 2014 13:29:55 -0500 Message-ID: <877ga1umho.fsf@netris.org> References: <1389761956.20078.27.camel@Renee-desktop.suse> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1389810758 11806 80.91.229.3 (15 Jan 2014 18:32:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2014 18:32:38 +0000 (UTC) Cc: guile-user@gnu.org To: Panicz Maciej Godek Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Jan 15 19:32:45 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3VGf-0005IW-7p for guile-user@m.gmane.org; Wed, 15 Jan 2014 19:32:45 +0100 Original-Received: from localhost ([::1]:56657 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3VGe-0001Sr-OO for guile-user@m.gmane.org; Wed, 15 Jan 2014 13:32:44 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54109) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3VGS-0001Rr-2X for guile-user@gnu.org; Wed, 15 Jan 2014 13:32:37 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3VGG-0002Y1-K6 for guile-user@gnu.org; Wed, 15 Jan 2014 13:32:31 -0500 Original-Received: from world.peace.net ([96.39.62.75]:50663) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3VGG-0002Xu-Gh for guile-user@gnu.org; Wed, 15 Jan 2014 13:32:20 -0500 Original-Received: from 209-6-91-212.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.91.212] helo=yeeloong) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1W3VG8-0003z5-9M; Wed, 15 Jan 2014 13:32:12 -0500 In-Reply-To: (Panicz Maciej Godek's message of "Wed, 15 Jan 2014 16:27:50 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11002 Archived-At: Panicz Maciej Godek writes: > Your solution seems reasonable, but I have found another way, which > lead me to some new problems. > I realised that since sockets are ports in guile, I could process them > with the plain "read" (which is what I have been using them for > anyway). > > However, this approach caused some new problems. The thing is that if > I'm trying to read some message from port, and that message does not > end with a delimiter (like a whitespace or a balancing, closing > parenthesis), then the read would wait forever, possibly gluing its > arguments. > > The solution I came up with is through soft ports. The idea is to have > a port proxy, that -- if it would block -- would return an eof-object > instead. This is terribly inefficient, and also not robust. Guile's native soft ports do not support efficient reading, because everything is one character at a time. Also, Guile's 'char-ready?' currently does the job of 'u8-ready?', i.e. it only checks if a _byte_ is available, not a whole character, so the 'read-char' might still block. Anyway, if this is a socket, what if the data isn't available simply because of network latency? Then you'll generate a spurious EOF. To offer my own answer to your original question: R7RS-small provides an API that does precisely what you asked for. Its 'utf8->string' procedure accepts optional 'start' and 'end' byte positions. I implemented this on the 'r7rs-wip' branch of Guile git as follows: http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=module/scheme/base.scm;h=f110d4c2b241ec0941b4223cece05c309db5308a;hb=r7rs-wip#l327 (import (rename (rnrs bytevectors) (utf8->string r6rs-utf8->string) (string->utf8 r6rs-string->utf8) (bytevector-copy r6rs-bytevector-copy) (bytevector-copy! r6rs-bytevector-copy!))) [...] (define bytevector-copy (case-lambda ((bv) (r6rs-bytevector-copy bv)) ((bv start) (let* ((len (- (bytevector-length bv) start)) (result (make-bytevector len))) (r6rs-bytevector-copy! bv start result 0 len) result)) ((bv start end) (let* ((len (- end start)) (result (make-bytevector len))) (r6rs-bytevector-copy! bv start result 0 len) result)))) (define utf8->string (case-lambda ((bv) (r6rs-utf8->string bv)) ((bv start) (r6rs-utf8->string (bytevector-copy bv start))) ((bv start end) (r6rs-utf8->string (bytevector-copy bv start end)))))