From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: unicode status Date: Sun, 06 Sep 2009 08:02:25 -0700 Message-ID: <1252249345.17414.21280.camel@localhost.localdomain> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1252249398 20169 80.91.229.12 (6 Sep 2009 15:03:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 6 Sep 2009 15:03:18 +0000 (UTC) Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Sep 06 17:03:11 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MkJGo-0005Lf-2K for guile-devel@m.gmane.org; Sun, 06 Sep 2009 17:03:10 +0200 Original-Received: from localhost ([127.0.0.1]:49680 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MkJGn-0004rK-7z for guile-devel@m.gmane.org; Sun, 06 Sep 2009 11:03:09 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MkJGg-0004r5-A5 for guile-devel@gnu.org; Sun, 06 Sep 2009 11:03:02 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MkJGe-0004qp-LB for guile-devel@gnu.org; Sun, 06 Sep 2009 11:03:02 -0400 Original-Received: from [199.232.76.173] (port=51876 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MkJGe-0004qj-C9 for guile-devel@gnu.org; Sun, 06 Sep 2009 11:03:00 -0400 Original-Received: from smtp108.prem.mail.sp1.yahoo.com ([98.136.44.63]:21405) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1MkJGd-0002NG-Lb for guile-devel@gnu.org; Sun, 06 Sep 2009 11:02:59 -0400 Original-Received: (qmail 54346 invoked from network); 6 Sep 2009 15:02:58 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-Id:Mime-Version:X-Mailer:Content-Transfer-Encoding; b=Cq+TzYUKEuIaaB6iwM29X1yPmP1/6LCNLAAFrBYxc/ZPchLVlgGgfLZx3x1u2rljH1BTaKJ1a2h9fYmq6fcPU7/aTwjRA/xy2HvDDsnnPBqRfC+iw1yoRGLU/7o3Lmfs+QNDYeVl3QRLQ0AyJc/tJ3kyqhJQMgQqhLN1WD9ibNU= ; Original-Received: from adsl-71-130-218-93.dsl.irvnca.pacbell.net (spk121@71.130.218.93 with plain) by smtp108.prem.mail.sp1.yahoo.com with SMTP; 06 Sep 2009 08:02:58 -0700 PDT X-Yahoo-SMTP: FzNaA9iswBDuBl1BmgaIRDaP9Q-- X-YMail-OSG: PSHxfYYVM1mUf2CsCm8qVerQza4i9nUAx.sq8NVGD4tvf.WdGcMO2dyxC5eqUIr0AYuP1Qy10L4zIfwQOyEMKWbgDek08cK2yUClG.49dqpzu2tlSTUsBmWDwh.42wgAN8dkCBH.LWuYb5wA_wvhVprKIu54yGxs_jqzEwZC2NFvViTfJmuFq3KdLxvjYK_B0lbUyLEsPuplAzCTUfmAavXq3b0rBEA1KPrSh_uk_AP.79HUnNB.jAZvtz1wMlKi.zFUPtA- X-Yahoo-Newman-Property: ymail-3 In-Reply-To: X-Mailer: Evolution 2.24.5 (2.24.5-2.fc10) X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.7-5.2 (or MacOS X 10.2-10.4) (2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9274 Archived-At: On Sun, 2009-09-06 at 12:45 +0200, Andy Wingo wrote: > Hey Mike, > > Would you mind posting to the list a "state of unicode & guile" summary? > I'm very excited about finally being able to say "Guile does unicode", > and was wondering what was left to do :) > > Andy OK. First, here's the stuff I've already put in NEWS ** Characters Characters can take the whole Unicode range. char-upcase and char-downcase use default Unicode casing rules. Character comparisons such as chardouble and locale-string->int. Bruno has some suggestions on how to do that at http://savannah.gnu.org/support/?106998 * I haven't done any testing on readline or gettext * Unicode-capable regex has not been implemented. Libunistring might do this someday. Until then, there will probably have to be the hack where strings are converted to UTF-8 encoding to pass through regex. This doesn't get you Unicode regex, but, it keeps non-ASCII from being mangled by regex. * EMACS has a lot of aliases that can be use in the "-*- coding: XXXXX -*-" line, like latin-1, that aren't valid encoding names. The reader should be modified to understand the common ones. * The whole issue of R6RS compliance will have to be dealt with some day. For example, I went with \xHH \uHHHH and \UHHHHHH escapes because they were backwards compatible with the \xHH we already had. R6RS uses a variable length hex escape terminated by a semicolon: \xHH; \xHHH;. These are not backward compatible. There are some R6RS functions that are missing: string-foldcase, string normalization routines. Also, R6RS and R5RS seem to disagree on the definition of string-upcase et al. R6RS is clear that the result of string-upcase can have more letters that its input, and it gets rid of string-upcase! for the same reason. That's all I remember off the top of my head. Thanks, Mike