From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: Internal visibility Date: Thu, 12 Jun 2008 20:45:25 +0000 (UTC) Message-ID: References: <87k5i5d6ei.fsf@ossau.uklinux.net> <87lk2jhp0h.fsf@gnu.org> <87skwrce8y.fsf@ossau.uklinux.net> <87iqxledzz.fsf@gnu.org> <87lk2futg0.fsf@ossau.uklinux.net> <87fxslr1jr.fsf_-_@gnu.org> <878wxv5t7q.fsf@gnu.org> <87mym6dv6t.fsf@gnu.org> <49dd78620806091110v7a667787mef392fbf4446139d@mail.gmail.com> <87iqwhn3jw.fsf@gnu.org> <87d4mpsold.fsf@ambire.localdomain> <878wxdze2q.fsf@gnu.org> <87r6b42yyh.fsf@ambire.localdomain> <87ve0gyxg4.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1213303821 19890 80.91.229.12 (12 Jun 2008 20:50:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 12 Jun 2008 20:50:21 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Jun 12 22:51:04 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1K6tl7-0003S0-FX for guile-devel@m.gmane.org; Thu, 12 Jun 2008 22:51:01 +0200 Original-Received: from localhost ([127.0.0.1]:36206 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K6tkJ-0004vW-Cv for guile-devel@m.gmane.org; Thu, 12 Jun 2008 16:50:11 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K6tkG-0004us-0o for guile-devel@gnu.org; Thu, 12 Jun 2008 16:50:08 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1K6tkE-0004uX-O7 for guile-devel@gnu.org; Thu, 12 Jun 2008 16:50:07 -0400 Original-Received: from [199.232.76.173] (port=44263 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K6tkE-0004uU-GD for guile-devel@gnu.org; Thu, 12 Jun 2008 16:50:06 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:59539 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1K6tkE-0007R6-1v for guile-devel@gnu.org; Thu, 12 Jun 2008 16:50:06 -0400 Original-Received: from root by ciao.gmane.org with local (Exim 4.43) id 1K6tkA-0006Cq-RM for guile-devel@gnu.org; Thu, 12 Jun 2008 20:50:02 +0000 Original-Received: from 64-52-12-172.client.cypresscom.net ([64.52.12.172]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Jun 2008 20:50:02 +0000 Original-Received: from spk121 by 64-52-12-172.client.cypresscom.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Jun 2008 20:50:02 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 29 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: main.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 64.52.12.172 (Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:7326 Archived-At: Ludovic Courtès gnu.org> writes: > Yes, that's probably a good idea. At any rate, we only have > `scm_to_locale_string ()' currently so it's not too late to add a single > function with an encoding parameter in lieu of the proposed > `scm_to_{utf8,utf16,utf32,ucs4,...}_string ()'. > > But first of all, one needs to implement Unicode support. FWIW, I have a complete unicode support library for Guile called GuICU. It lives at http://gano.sourceforge.net. It works for me, but, hasn't been widely tested. It is built on the large and cumbersome IBM ICU library. ICU encodes things internally as UTF16, which I always though of as a poor idea, since neither allows O(1) seeking of individual codepoints nor works so well with UTF-8. Based on my experience with ICU and putting this library together, and looking at what r6rs claims should be the future for Unicode, I really do think that UTF-32 is the way to go. Alternately, one could build a string library where strings are represented as either u8 or u32 vectors. If a string function is asked to operate on a u32 vector, it will assume a UTF32 encoding. If a string function is asked to operate on a u8 vector it will either require a locale or, as a fallback, treat the string as a raw byte vector. This would be twice the work to implement, though.