From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: Wide strings status Date: Tue, 21 Apr 2009 20:26:20 -0700 Message-ID: <1240370780.3133.102.camel@localhost.localdomain> References: <1240279908.3133.76.camel@localhost.localdomain> <87bpqpu1r4.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1240370797 18190 80.91.229.12 (22 Apr 2009 03:26:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 22 Apr 2009 03:26:37 +0000 (UTC) Cc: guile-devel@gnu.org To: Ludovic =?ISO-8859-1?Q?Court=E8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Apr 22 05:27:57 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LwT7s-00055B-Q1 for guile-devel@m.gmane.org; Wed, 22 Apr 2009 05:27:57 +0200 Original-Received: from localhost ([127.0.0.1]:49840 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LwT6T-0003f7-Qm for guile-devel@m.gmane.org; Tue, 21 Apr 2009 23:26:29 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LwT6N-0003dW-Ux for guile-devel@gnu.org; Tue, 21 Apr 2009 23:26:24 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LwT6L-0003ac-Bl for guile-devel@gnu.org; Tue, 21 Apr 2009 23:26:22 -0400 Original-Received: from [199.232.76.173] (port=41985 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LwT6L-0003aN-49 for guile-devel@gnu.org; Tue, 21 Apr 2009 23:26:21 -0400 Original-Received: from smtp107.prem.mail.sp1.yahoo.com ([98.136.44.62]:28833) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1LwT6K-0006c4-IR for guile-devel@gnu.org; Tue, 21 Apr 2009 23:26:20 -0400 Original-Received: (qmail 75484 invoked from network); 22 Apr 2009 03:26:19 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-YMail-OSG:X-Yahoo-Newman-Property:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-Id:Mime-Version:X-Mailer:Content-Transfer-Encoding; b=Zs6u1OecKBFW5CNeTwIY1tXYSGtj6WsHgCFR8PPQLX+kzLmuyhjIYgOSbc1jdxMRaMU9t2aVmjhWFWgbIPE++FnlO7wHe1kuTPDAXFJszPYg2VQr9S0tgbB3mbog+0qhR007djXut3kE9LlryRMfqv/spPBvboRWREcpf/0flGY= ; Original-Received: from unknown (HELO ?192.168.1.64?) (spk121@71.143.114.144 with plain) by smtp107.prem.mail.sp1.yahoo.com with SMTP; 22 Apr 2009 03:26:18 -0000 X-YMail-OSG: emt9mkwVM1lxGHUx5nf4XAg7lwIvrQf54mMco_ymJR9DEVLOWUW2.l8MZFbCSRhEpQJ2jcdsqvO794QEhASPpcCQqJW.8JCsa78W9.loYJNxQz19kcgm7TT_0Oa7HgT5pPFqv1ggfrL5gmZOOKA5dmUqZ1BnOauk82xqSay7WxJgZncW.Sj2RY8JRsZBTketNAUD7uLSJlCvzlcaWh.wea.bVS3lFX4Mm0mBMJe1zni9BAikiOrlvdCP0wobv7nhrlLzKNHa0FlRlN8rr3P3PtxyZOnK2p1.egjFQDfv4DFz2vRY8qg- X-Yahoo-Newman-Property: ymail-3 In-Reply-To: <87bpqpu1r4.fsf@gnu.org> X-Mailer: Evolution 2.24.5 (2.24.5-1.fc10) X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.7-5.2 (or MacOS X 10.2-10.4) (2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8442 Archived-At: On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: > > This is all going to be slower than before because of the string > > conversion operations, but, I didn't want to do any premature > > optimization. First, I wanted to get it working, but, there is plenty > > of room for optimization later. > > Good. Maybe it'd be nice to add simple micro-benchmarks for > `string-ref', `string-set!' et al. under `benchmarks'. > I'll put it on my todo list. > > Character encoding needs to be a property of ports, so that not all > > string operations are done in the current locale. This is necessary so > > that UTF-8-encoded source files are not interpreted differently based on > > the current locale. > > You seem to imply that `scm_getc ()' will now return a Unicode > codepoint, is that right? What about `scm_c_{read,write} ()', and > `scm_{get,put}s ()'? > I vacillate on this, but, I think the most logical approach is to have scm_getc return codepoints and to have the rest of those functions return strings that could contain wide characters. This is if and only if the port has been assigned a character encoding. If it doesn't have an associated encoding, ports will be treated as de facto ISO-8859-1, where character values between 0 and 255 are stored without any interpretation and characters greater than 255 are invalid. (Unicode codepoints 0 to 255 are by design the same as ISO-8859-1.) > > The VM and interpreter need to be updated to deal with wide chars and > > probably in other ways that are unclear to me now. Wide strings are > > currently getting truncated to 8-bit somewhere in there. > > The compiler could use bytevectors when dealing with bytecode. Maybe > that would clarify things. On those issues, I'll have to concede to the wisdom of others. I'll do what I can with the C code, and then I'll need help. > > Thanks, > Ludo'. > Thanks for taking the time. -Mike