unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: guile-devel@gnu.org
Subject: Re: Wide strings
Date: Thu, 29 Jan 2009 01:01:34 +0100	[thread overview]
Message-ID: <87iqnz6kv5.fsf@gnu.org> (raw)
In-Reply-To: m3r62nwa5a.fsf@pobox.com

Hi,

Andy Wingo <wingo@pobox.com> writes:

> On Wed 28 Jan 2009 17:44, Mike Gran <spk121@yahoo.com> writes:
>
>> Since I need this functionality taken care of, and since I have some
>> time to play with it, what's the procedure here?
>
> The best thing IMO would be to hack on it on a Git branch, with small
> and correct patches. We could get you commit access if you don't already
> have it (Ludo or Neil would have to reply on that). Then you could push
> your work directly to a branch, so we all can review it easily.

Yep, setting up a branch is the easiest way.  You can then post updates
or requests for comments as things progress.  We'll need you to assign
the copyright for your changes to the FSF as well (I'll send you an
email sometime later, I need to go to sleep now).  In the meantime, you
can browse the GNU Coding Standards.  :-)

>> Do we need to talk more about what needs to be accomplished? Do we
>> need a complete specification? Do we need a vote on if it is a good
>> idea?
>
> I think you're going in the right direction. More importantly, although
> I can't speak for them, Neil and Ludo seem to think so too.

Yes, as far as I'm concerned.  I know you're probably more knowledgeable
than I am on this issue and I'm confident.

>> 1.  Convert the internal char and string representation to be 
>> explicitly ISO 8859-1.  Add the to/from locale conversion functionality
>> while still retaining 8-bit strings.  Replace C library funcs with 
>> Gnulib string funcs where appropriate.
>
> Sounds appropriate to me.

+1.

>> 2.  Convert the internal representation of chars to 4-byte 
>> codepoints, while still retaining 8-bit strings.
>
> Currently, characters are immediate values, with an 8-bit tag. See
> tags.h:333. So it seems we have 24 bits remaining, and unicode claims
> that 21 bits are the minimum necessary -- so we're good, if you can
> figure out a reasonable way to go from a 32-bit codepoint to a 24-bit
> codepoint.

Good (code)point.  It might be that we'll have to resort to cells for
chars themselves, while storing raw `wchar_t' in a stringbuf content.

>> 3.  Convert strings to be a union of 1 byte and 4 byte chars.
>
> There's room on stringbufs to have a flag, I think. Dunno if that's the
> right way to do it.

I had something like that in mind.

> Converting the symbols and keywords code to do the
> right thing will be a little bit of work, too.

Not if it's handled at the level of stringbufs, I think.

BTW, while BDW-GC isn't used, make sure to update `scm_i_stringbuf_free ()'
and friends so that they pass the right number of bytes that are to be
freed...

Thanks,
Ludo'.





  reply	other threads:[~2009-01-29  0:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-25 21:15 Wide strings Mike Gran
2009-01-25 22:31 ` Ludovic Courtès
2009-01-25 23:32   ` Neil Jerram
2009-01-26 20:24     ` Ludovic Courtès
2009-01-26  0:16   ` Mike Gran
2009-01-26 15:21     ` Mike Gran
2009-01-26 21:40     ` Ludovic Courtès
2009-01-27  5:38       ` Mike Gran
2009-01-27  5:52         ` Mike Gran
2009-01-27  9:50           ` Andy Wingo
2009-01-27 18:59         ` Ludovic Courtès
2009-01-28 16:44           ` Mike Gran
2009-01-28 18:36             ` Andy Wingo
2009-01-29  0:01               ` Ludovic Courtès [this message]
2009-01-30  0:15                 ` Neil Jerram
2009-01-28 20:44             ` Clinton Ebadi
2009-01-28 23:49               ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87iqnz6kv5.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).