unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mike Gran <spk121@yahoo.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: unicode status
Date: Mon, 14 Sep 2009 07:27:41 -0700	[thread overview]
Message-ID: <1252938461.24639.182.camel@localhost.localdomain> (raw)
In-Reply-To: <87ljkiebv9.fsf@gnu.org>

On Mon, 2009-09-14 at 00:08 +0200, Ludovic Courtès wrote:
> Hello!
> 
> Mike Gran <spk121@yahoo.com> writes:
> 
> > ** Ports do transcoding
> 
> Speaking of this, would you be willing to implement R6RS’ transcoder
> API in ‘r6rs-ports.c’?  :-)

Hard to say.  After September, my free time evaporates.  However, it
shouldn't be a very difficult task to do.  The difference between R6RS
ports and what we've done so far is the end-of-line conversions that
R6RS requires: CR, CR/LF, NEL, NEL/LF, LS, etc.

> 
> > * The i18n library hasn't been touched.  It should probably move to use
> > functions like u32_casecmp from libunistring for unicode-capable
> > locale-specific sorting.
> 
> Is u32_casecmp locale-dependent?

>From the docs

 -- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const
          uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
          uninorm_t NF, int *RESULTP)
     Compares S1 and S2, ignoring differences in case and normalization,
     using the collation rules of the current locale.

> 
> > But the #ifdef and locale madness in i18n is
> > deep.
> 
> Heh heh, it’s deep but needed.  It allows us to provide an API with
> first-class locale objects, akin to POSIX 2008’s ‘locale_t’, which is
> neat IMO.
> 
> At any rate, the parts you’re interested in can probably be modified
> without touching the #ifdef madness.

The libunistring way for sorting would be something like
1. set the locale
2. convert the strings to unistring u32 strings
3. get the locale's 'language' with uc_locale_language ()
4. use the language and strings as input to u32_strcoll or u32_casecoll
5. profit!

So once that problem of setting the locale and getting the
uc_locale_language is solved generically for one of the i18n funcs, the
rest should fall into place.  If, in your copious free time (LOL), you
want to figure out that for one func, I can do the rest by extension.
Otherwise, I'll get to it eventually.

You can't really do unicode sorting without also including the
normalization functions string-normalize-nfc, string-normalize-nfkc etc
from (rnrs unicode (6)) so those'll need to be added.  That also isn't
hard: libunistring does the low-level op.

> Overall, it seems to me that Unicode support is in a very good shape and
> the points above aren’t too worrying.

Thanks,

Mike






      reply	other threads:[~2009-09-14 14:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-06 10:45 unicode status Andy Wingo
2009-09-06 15:02 ` Mike Gran
2009-09-13 22:08   ` Ludovic Courtès
2009-09-14 14:27     ` Mike Gran [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1252938461.24639.182.camel@localhost.localdomain \
    --to=spk121@yahoo.com \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).