unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mike Gran <spk121@yahoo.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: Wide strings status
Date: Tue, 21 Apr 2009 20:26:20 -0700	[thread overview]
Message-ID: <1240370780.3133.102.camel@localhost.localdomain> (raw)
In-Reply-To: <87bpqpu1r4.fsf@gnu.org>

On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote:

> > This is all going to be slower than before because of the string
> > conversion operations, but, I didn't want to do any premature
> > optimization.  First, I wanted to get it working, but, there is plenty
> > of room for optimization later.
> 
> Good.  Maybe it'd be nice to add simple micro-benchmarks for
> `string-ref', `string-set!' et al. under `benchmarks'.
> 

I'll put it on my todo list.

> > Character encoding needs to be a property of ports, so that not all
> > string operations are done in the current locale.  This is necessary so
> > that UTF-8-encoded source files are not interpreted differently based on
> > the current locale.
> 
> You seem to imply that `scm_getc ()' will now return a Unicode
> codepoint, is that right?  What about `scm_c_{read,write} ()', and
> `scm_{get,put}s ()'?
> 

I vacillate on this, but, I think the most logical approach is to have
scm_getc return codepoints and to have the rest of those functions
return strings that could contain wide characters.  This is if and only
if the port has been assigned a character encoding.  If it doesn't have
an associated encoding, ports will be treated as de facto ISO-8859-1,
where character values between 0 and 255 are stored without any
interpretation and characters greater than 255 are invalid.  (Unicode
codepoints 0 to 255 are by design the same as ISO-8859-1.)

> > The VM and interpreter need to be updated to deal with wide chars and
> > probably in other ways that are unclear to me now.  Wide strings are
> > currently getting truncated to 8-bit somewhere in there.
> 
> The compiler could use bytevectors when dealing with bytecode.  Maybe
> that would clarify things.

On those issues, I'll have to concede to the wisdom of others.  I'll do
what I can with the C code, and then I'll need help.

> 
> Thanks,
> Ludo'.
> 

Thanks for taking the time.

-Mike






  reply	other threads:[~2009-04-22  3:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-21  2:11 Wide strings status Mike Gran
2009-04-21 21:37 ` Ludovic Courtès
2009-04-22  3:26   ` Mike Gran [this message]
2009-04-22 20:03     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1240370780.3133.102.camel@localhost.localdomain \
    --to=spk121@yahoo.com \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).