From: Nala Ginrut <nalaginrut@gmail.com>
To: Panicz Maciej Godek <godek.maciek@gmail.com>
Cc: "guile-user@gnu.org" <guile-user@gnu.org>
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 12:59:16 +0800 [thread overview]
Message-ID: <1389761956.20078.27.camel@Renee-desktop.suse> (raw)
In-Reply-To: <CAMFYt2a77QRanD=p9viKsZLHpDLAvnB453r5QTdv3f699b-KTw@mail.gmail.com>
hi there!
On Tue, 2014-01-14 at 00:17 +0100, Panicz Maciej Godek wrote:
> Another option would be to use
> (substring (utf8->string buffer 0 n))
>
> This one works, but according to the manual, the
> string is "newly allocated", so it's unnecessary overhead.
>
Actually, substring is COW(copy-on-write), so you don't have to be
worried. And you may try substring/shared which won't allocate at all.
But please be careful the side-effect in you context ;-)
> What would be the best solution?
>
IMO, no matter you us substring or substring/shared in this context, you
have to allocate a new string. The reason is we don't have something
like bytevector/shared.
But IIRC bytevector in Guile is similar with C array, which means you
can avoid any allocation when you try to slice a bytevector if you can
handle the array pointer properly.
So one may take advantage of it.
!!But I can't say you can avoid allocation when you convert bytevector
to string, because either utf8->string or pointer->string will allocate
anyway.
(Anyone correct me please if I'm wrong!)
Here's my black magic:
-------------------------------cut------------------------------
(use-modules (system foreign)) ; to handle the C pointer
(define* (bv->string/partly bv #:optional (start 0)
(end #f)
(size 1)
(encoding "utf-8"))
(let ((len (if end (* size (- end start))
(- (bytevector-length bv) (* size start))))
(addr (+ (pointer-address (bytevector->pointer bv))
(* size start))))
(pointer->string (make-pointer addr) len encoding)))
-------------------------------end--------------------------------
;;(define bv (string->utf8 "我了个去啊"))
;; NOTE: Chinese character needs size==3
(bv->string/partly bv 2 4 3)
==> "个去"
;; And for common latin character whose size==1
;;(define bv2 (string->utf8 "hello world"))
(bv->string/partly bv 0 5)
==> "hello"
But I have a give a warning again, when you try to avoid allocation
overhead, you have to face the risk of the side-effect. To me, I'd
prefer pure-functional. ;-P
> TIA
> M
next prev parent reply other threads:[~2014-01-15 4:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-13 23:17 Converting a part of byte vector to UTF-8 string Panicz Maciej Godek
2014-01-15 4:59 ` Nala Ginrut [this message]
2014-01-15 15:27 ` Panicz Maciej Godek
2014-01-15 18:29 ` Mark H Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1389761956.20078.27.camel@Renee-desktop.suse \
--to=nalaginrut@gmail.com \
--cc=godek.maciek@gmail.com \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).