unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Nala Ginrut <nalaginrut@gmail.com>
To: Panicz Maciej Godek <godek.maciek@gmail.com>
Cc: "guile-user@gnu.org" <guile-user@gnu.org>
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 12:59:16 +0800	[thread overview]
Message-ID: <1389761956.20078.27.camel@Renee-desktop.suse> (raw)
In-Reply-To: <CAMFYt2a77QRanD=p9viKsZLHpDLAvnB453r5QTdv3f699b-KTw@mail.gmail.com>

hi there!

On Tue, 2014-01-14 at 00:17 +0100, Panicz Maciej Godek wrote:
> Another option would be to use
> (substring (utf8->string buffer 0 n))
> 
> This one works, but according to the manual, the
> string is "newly allocated", so it's unnecessary overhead.
> 

Actually, substring is COW(copy-on-write), so you don't have to be
worried. And you may try substring/shared which won't allocate at all.
But please be careful the side-effect in you context ;-) 

> What would be the best solution?
> 

IMO, no matter you us substring or substring/shared in this context, you
have to allocate a new string. The reason is we don't have something
like bytevector/shared.

But IIRC bytevector in Guile is similar with C array, which means you
can avoid any allocation when you try to slice a bytevector if you can
handle the array pointer properly. 
So one may take advantage of it.

!!But I can't say you can avoid allocation when you convert bytevector
to string, because either utf8->string or pointer->string will allocate
anyway.

(Anyone correct me please if I'm wrong!)

Here's my black magic:
-------------------------------cut------------------------------
(use-modules (system foreign)) ; to handle the C pointer

(define* (bv->string/partly bv #:optional (start 0) 
                                          (end #f) 
                                          (size 1)
                                          (encoding "utf-8"))
 (let ((len (if end (* size (- end start)) 
                    (- (bytevector-length bv) (* size start))))
       (addr (+ (pointer-address (bytevector->pointer bv)) 
                (* size start))))
 (pointer->string (make-pointer addr) len encoding)))
-------------------------------end--------------------------------

;;(define bv (string->utf8 "我了个去啊"))
;; NOTE: Chinese character needs size==3
(bv->string/partly bv 2 4 3)
==> "个去"

;; And for common latin character whose size==1
;;(define bv2 (string->utf8 "hello world"))
(bv->string/partly bv 0 5)
==> "hello"


But I have a give a warning again, when you try to avoid allocation
overhead, you have to face the risk of the side-effect. To me, I'd
prefer pure-functional. ;-P

> TIA
> M





  reply	other threads:[~2014-01-15  4:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-13 23:17 Converting a part of byte vector to UTF-8 string Panicz Maciej Godek
2014-01-15  4:59 ` Nala Ginrut [this message]
2014-01-15 15:27   ` Panicz Maciej Godek
2014-01-15 18:29     ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1389761956.20078.27.camel@Renee-desktop.suse \
    --to=nalaginrut@gmail.com \
    --cc=godek.maciek@gmail.com \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).