From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.user Subject: Re: Converting a part of byte vector to UTF-8 string Date: Wed, 15 Jan 2014 12:59:16 +0800 Organization: HFG Message-ID: <1389761956.20078.27.camel@Renee-desktop.suse> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1389761980 18726 80.91.229.3 (15 Jan 2014 04:59:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2014 04:59:40 +0000 (UTC) Cc: "guile-user@gnu.org" To: Panicz Maciej Godek Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Jan 15 05:59:46 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3IZu-0000H4-0S for guile-user@m.gmane.org; Wed, 15 Jan 2014 05:59:46 +0100 Original-Received: from localhost ([::1]:52097 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3IZt-0007WT-N0 for guile-user@m.gmane.org; Tue, 14 Jan 2014 23:59:45 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45416) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3IZd-0007W2-Ry for guile-user@gnu.org; Tue, 14 Jan 2014 23:59:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3IZV-0000uB-F6 for guile-user@gnu.org; Tue, 14 Jan 2014 23:59:29 -0500 Original-Received: from mail-pd0-x230.google.com ([2607:f8b0:400e:c02::230]:62100) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3IZV-0000th-7e for guile-user@gnu.org; Tue, 14 Jan 2014 23:59:21 -0500 Original-Received: by mail-pd0-f176.google.com with SMTP id r10so609804pdi.7 for ; Tue, 14 Jan 2014 20:59:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :organization:content-type:mime-version:content-transfer-encoding; bh=Y0jh197j9meyVuj3FAFWvrZ9gSHxhDjRrdYfMPLBRxk=; b=a1JLPaXnDV44tJlBF1Gz13OigyjtiV3tricQ1/soIdJqvrXBTOMx26TnV++Qo2Gti9 fEN0/mtq5fpeIQof1z+6Qi70fbIuNhwXUwUjSozt9cZnt8jTUsUpnlB4lfdgrvWc/tzM e+KzJgUgoyh1Sq2vt0ffr8Do3PY/TY9y+dece5WPU6tROBAEWUeYRbDQX4l40pFMYR4J R3BUB/L/yHDjH7rkYeB11FlrKe8g/B+kPz007GD6mllJB9Ld2u2PGtT6NBBKwhPnZKpN NjtJKABFxQjYkzN0dtxAfvqmDOJrHtlxybLLTp80LevSZTvkBefQCWsySymtnEf2MlBV EWAQ== X-Received: by 10.66.49.74 with SMTP id s10mr235437pan.0.1389761959704; Tue, 14 Jan 2014 20:59:19 -0800 (PST) Original-Received: from [147.2.147.115] ([203.192.156.9]) by mx.google.com with ESMTPSA id gg10sm5011471pbc.46.2014.01.14.20.59.17 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Tue, 14 Jan 2014 20:59:19 -0800 (PST) In-Reply-To: X-Mailer: Evolution 3.4.4 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400e:c02::230 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:10996 Archived-At: hi there! On Tue, 2014-01-14 at 00:17 +0100, Panicz Maciej Godek wrote: > Another option would be to use > (substring (utf8->string buffer 0 n)) > > This one works, but according to the manual, the > string is "newly allocated", so it's unnecessary overhead. > Actually, substring is COW(copy-on-write), so you don't have to be worried. And you may try substring/shared which won't allocate at all. But please be careful the side-effect in you context ;-) > What would be the best solution? > IMO, no matter you us substring or substring/shared in this context, you have to allocate a new string. The reason is we don't have something like bytevector/shared. But IIRC bytevector in Guile is similar with C array, which means you can avoid any allocation when you try to slice a bytevector if you can handle the array pointer properly. So one may take advantage of it. !!But I can't say you can avoid allocation when you convert bytevector to string, because either utf8->string or pointer->string will allocate anyway. (Anyone correct me please if I'm wrong!) Here's my black magic: -------------------------------cut------------------------------ (use-modules (system foreign)) ; to handle the C pointer (define* (bv->string/partly bv #:optional (start 0) (end #f) (size 1) (encoding "utf-8")) (let ((len (if end (* size (- end start)) (- (bytevector-length bv) (* size start)))) (addr (+ (pointer-address (bytevector->pointer bv)) (* size start)))) (pointer->string (make-pointer addr) len encoding))) -------------------------------end-------------------------------- ;;(define bv (string->utf8 "我了个去啊")) ;; NOTE: Chinese character needs size==3 (bv->string/partly bv 2 4 3) ==> "个去" ;; And for common latin character whose size==1 ;;(define bv2 (string->utf8 "hello world")) (bv->string/partly bv 0 5) ==> "hello" But I have a give a warning again, when you try to avoid allocation overhead, you have to face the risk of the side-effect. To me, I'd prefer pure-functional. ;-P > TIA > M