From: Mike Gran <spk121@yahoo.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: Wide strings status
Date: Tue, 21 Apr 2009 20:26:20 -0700 [thread overview]
Message-ID: <1240370780.3133.102.camel@localhost.localdomain> (raw)
In-Reply-To: <87bpqpu1r4.fsf@gnu.org>
On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote:
> > This is all going to be slower than before because of the string
> > conversion operations, but, I didn't want to do any premature
> > optimization. First, I wanted to get it working, but, there is plenty
> > of room for optimization later.
>
> Good. Maybe it'd be nice to add simple micro-benchmarks for
> `string-ref', `string-set!' et al. under `benchmarks'.
>
I'll put it on my todo list.
> > Character encoding needs to be a property of ports, so that not all
> > string operations are done in the current locale. This is necessary so
> > that UTF-8-encoded source files are not interpreted differently based on
> > the current locale.
>
> You seem to imply that `scm_getc ()' will now return a Unicode
> codepoint, is that right? What about `scm_c_{read,write} ()', and
> `scm_{get,put}s ()'?
>
I vacillate on this, but, I think the most logical approach is to have
scm_getc return codepoints and to have the rest of those functions
return strings that could contain wide characters. This is if and only
if the port has been assigned a character encoding. If it doesn't have
an associated encoding, ports will be treated as de facto ISO-8859-1,
where character values between 0 and 255 are stored without any
interpretation and characters greater than 255 are invalid. (Unicode
codepoints 0 to 255 are by design the same as ISO-8859-1.)
> > The VM and interpreter need to be updated to deal with wide chars and
> > probably in other ways that are unclear to me now. Wide strings are
> > currently getting truncated to 8-bit somewhere in there.
>
> The compiler could use bytevectors when dealing with bytecode. Maybe
> that would clarify things.
On those issues, I'll have to concede to the wisdom of others. I'll do
what I can with the C code, and then I'll need help.
>
> Thanks,
> Ludo'.
>
Thanks for taking the time.
-Mike
next prev parent reply other threads:[~2009-04-22 3:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-21 2:11 Wide strings status Mike Gran
2009-04-21 21:37 ` Ludovic Courtès
2009-04-22 3:26 ` Mike Gran [this message]
2009-04-22 20:03 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1240370780.3133.102.camel@localhost.localdomain \
--to=spk121@yahoo.com \
--cc=guile-devel@gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).