* UTF-8 and new ports
@ 2008-02-14 21:40 Mike Gran
2008-02-15 2:39 ` Stephen Compall
2008-02-15 8:36 ` Ludovic Courtès
0 siblings, 2 replies; 3+ messages in thread
From: Mike Gran @ 2008-02-14 21:40 UTC (permalink / raw)
To: Guile User
Hi-
Suppose I'm creating a new Guile port type that is
going to use NCurses primitives for input (scm_getc)
and output (display). NCurses can both receive input
and display output of wide characters, but, these
functions operate on 32-bit wide unicode codepoints,
aka UTF32.
It seems that port types are inherently 8-bit, right?
So to make this work, the ports will have to store and
transmit characters as UTF-8 encoded data. The
'fill_input' function will have to convert UTF-32 to
UTF-8 and then cache them, passing them 1 byte at a
time as requested. The 'write' function will receive
data 1 byte at a time and buffer it. It will only
write the character when a complete UTF-32 codepoint
has been received.
Sound right?
Has anyone already done this sort of thing?
--
Mike Gran
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: UTF-8 and new ports
2008-02-14 21:40 UTF-8 and new ports Mike Gran
@ 2008-02-15 2:39 ` Stephen Compall
2008-02-15 8:36 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Stephen Compall @ 2008-02-15 2:39 UTC (permalink / raw)
To: Mike Gran; +Cc: Guile User
Mike Gran <spk121@yahoo.com> writes:
> It seems that port types are inherently 8-bit, right?
> So to make this work, the ports will have to store and
> transmit characters as UTF-8 encoded data. The
> 'fill_input' function will have to convert UTF-32 to
> UTF-8 and then cache them, passing them 1 byte at a
> time as requested. The 'write' function will receive
> data 1 byte at a time and buffer it. It will only
> write the character when a complete UTF-32 codepoint
> has been received.
Alternatively, you could assume an 8-bit character set (either from
CTYPE, or force Latin-1), recode output to UTF-32, and either ignore
or deliver nulls or something else convenient (maybe space?) for
characters outside the 8-bit character set. This would be reasonable as
Guile characters are 8-bit anyway.
--
But you know how reluctant paranormal phenomena are to reveal
themselves when skeptics are present. --Robert Sheaffer, SkI 9/2003
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: UTF-8 and new ports
2008-02-14 21:40 UTF-8 and new ports Mike Gran
2008-02-15 2:39 ` Stephen Compall
@ 2008-02-15 8:36 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2008-02-15 8:36 UTC (permalink / raw)
To: guile-user
Hi,
Mike Gran <spk121@yahoo.com> writes:
> It seems that port types are inherently 8-bit, right?
> So to make this work, the ports will have to store and
> transmit characters as UTF-8 encoded data.
Indeed, the C port API in inherently 8-bit.
I don't know if it helps, but Guile-R6RS-Libs[*] provides limited UTF
handling:
guile> (use-modules (r6rs bytevector) (r6rs io ports))
guile> (define bv (get-bytevector-all (open-input-string "hello world")))
guile> (utf8->string bv)
"hello world"
It assumes that Guile strings are encoded in the current locale's 8-bit
charset, though...
Thanks,
Ludovic.
[*] http://repo.or.cz/w/guile-r6rs-libs.git
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-02-15 8:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-14 21:40 UTF-8 and new ports Mike Gran
2008-02-15 2:39 ` Stephen Compall
2008-02-15 8:36 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).