From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: unicode status Date: Mon, 14 Sep 2009 07:27:41 -0700 Message-ID: <1252938461.24639.182.camel@localhost.localdomain> References: <1252249345.17414.21280.camel@localhost.localdomain> <87ljkiebv9.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1252938548 15526 80.91.229.12 (14 Sep 2009 14:29:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Sep 2009 14:29:08 +0000 (UTC) Cc: guile-devel@gnu.org To: Ludovic =?ISO-8859-1?Q?Court=E8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Sep 14 16:29:00 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MnCY7-0004vy-Mb for guile-devel@m.gmane.org; Mon, 14 Sep 2009 16:29:00 +0200 Original-Received: from localhost ([127.0.0.1]:60426 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MnCY7-0002kS-2X for guile-devel@m.gmane.org; Mon, 14 Sep 2009 10:28:59 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MnCXT-0001l8-Sx for guile-devel@gnu.org; Mon, 14 Sep 2009 10:28:19 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MnCXS-0001jz-MZ for guile-devel@gnu.org; Mon, 14 Sep 2009 10:28:19 -0400 Original-Received: from [199.232.76.173] (port=46650 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MnCXS-0001jq-Fi for guile-devel@gnu.org; Mon, 14 Sep 2009 10:28:18 -0400 Original-Received: from smtp102.prem.mail.sp1.yahoo.com ([98.136.44.57]:26805) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1MnCXR-0002NP-U7 for guile-devel@gnu.org; Mon, 14 Sep 2009 10:28:18 -0400 Original-Received: (qmail 29473 invoked from network); 14 Sep 2009 14:28:16 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-Id:Mime-Version:X-Mailer:Content-Transfer-Encoding; b=RHNjR3ladZszPerwNdoIketIiwbJNG9pfDgv2eWaHLHttHklVM2++qIAStyAlu77piccrWW4cVnDNkbJZJ1+twHNWx4vdLMTNA6ztzdVCzA4FpDb2EZUV5bliLtWCUk8z5SR3bp/dL0NnQmwsqhUDRRlTReTxTgUbRInN//NELs= ; Original-Received: from adsl-71-130-218-93.dsl.irvnca.pacbell.net (spk121@71.130.218.93 with plain) by smtp102.prem.mail.sp1.yahoo.com with SMTP; 14 Sep 2009 07:28:16 -0700 PDT X-Yahoo-SMTP: FzNaA9iswBDuBl1BmgaIRDaP9Q-- X-YMail-OSG: kex4Z_IVM1n48GfjhGhdRWuiaA.xuo7_7z2Ay9zWBH_hxazZDFKnvv.F0y5D5pOHsLAvcsb_0xeQJVk5Y34PPni3Hf02fLQNnGGpxje5.7BCDmyqwT_UTSAli2Bb4WqD4LAMX9dOqh0bkgPX8Kw4xF8Gzl03hQCzrrl9_ECljMaMH.VmA2ED3GMnS96YzHmzOK7SqpOodq41z9ZmCjXak4Bkn_rjfynqdwrEZKsflwu1IwkV2E894fN3OpNM31jd_.BXxZtw2JcHN2jlngYAWm.0whSPFp.EsOqt X-Yahoo-Newman-Property: ymail-3 In-Reply-To: <87ljkiebv9.fsf@gnu.org> X-Mailer: Evolution 2.24.5 (2.24.5-2.fc10) X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.7-5.2 (or MacOS X 10.2-10.4) (2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9315 Archived-At: On Mon, 2009-09-14 at 00:08 +0200, Ludovic Courtès wrote: > Hello! > > Mike Gran writes: > > > ** Ports do transcoding > > Speaking of this, would you be willing to implement R6RS’ transcoder > API in ‘r6rs-ports.c’? :-) Hard to say. After September, my free time evaporates. However, it shouldn't be a very difficult task to do. The difference between R6RS ports and what we've done so far is the end-of-line conversions that R6RS requires: CR, CR/LF, NEL, NEL/LF, LS, etc. > > > * The i18n library hasn't been touched. It should probably move to use > > functions like u32_casecmp from libunistring for unicode-capable > > locale-specific sorting. > > Is u32_casecmp locale-dependent? >>From the docs -- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF, int *RESULTP) Compares S1 and S2, ignoring differences in case and normalization, using the collation rules of the current locale. > > > But the #ifdef and locale madness in i18n is > > deep. > > Heh heh, it’s deep but needed. It allows us to provide an API with > first-class locale objects, akin to POSIX 2008’s ‘locale_t’, which is > neat IMO. > > At any rate, the parts you’re interested in can probably be modified > without touching the #ifdef madness. The libunistring way for sorting would be something like 1. set the locale 2. convert the strings to unistring u32 strings 3. get the locale's 'language' with uc_locale_language () 4. use the language and strings as input to u32_strcoll or u32_casecoll 5. profit! So once that problem of setting the locale and getting the uc_locale_language is solved generically for one of the i18n funcs, the rest should fall into place. If, in your copious free time (LOL), you want to figure out that for one func, I can do the rest by extension. Otherwise, I'll get to it eventually. You can't really do unicode sorting without also including the normalization functions string-normalize-nfc, string-normalize-nfkc etc from (rnrs unicode (6)) so those'll need to be added. That also isn't hard: libunistring does the low-level op. > Overall, it seems to me that Unicode support is in a very good shape and > the points above aren’t too worrying. Thanks, Mike