From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Unicode, ports and encoding Date: Mon, 16 Feb 2009 15:51:33 -0800 (PST) Message-ID: <550226.89448.qm@web37908.mail.mud.yahoo.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1234828355 10424 80.91.229.12 (16 Feb 2009 23:52:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 16 Feb 2009 23:52:35 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Feb 17 00:53:50 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LZDHa-0000Qv-7L for guile-devel@m.gmane.org; Tue, 17 Feb 2009 00:53:50 +0100 Original-Received: from localhost ([127.0.0.1]:37561 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LZDGF-00070Y-Rl for guile-devel@m.gmane.org; Mon, 16 Feb 2009 18:52:27 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LZDFR-0006qN-4C for guile-devel@gnu.org; Mon, 16 Feb 2009 18:51:37 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LZDFP-0006pT-7H for guile-devel@gnu.org; Mon, 16 Feb 2009 18:51:36 -0500 Original-Received: from [199.232.76.173] (port=59262 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LZDFP-0006pN-0g for guile-devel@gnu.org; Mon, 16 Feb 2009 18:51:35 -0500 Original-Received: from web37908.mail.mud.yahoo.com ([209.191.91.170]:46334) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1LZDFO-0002wa-Ii for guile-devel@gnu.org; Mon, 16 Feb 2009 18:51:34 -0500 Original-Received: (qmail 89556 invoked by uid 60001); 16 Feb 2009 23:51:33 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=pSY2OUc2Yu/EOPSmeiYI4vuczyfTBwm/f8l0VlOaOf9UmwvBLmWu4KD7J8GvYorspP6yqrsaCjHqStLqv2CYrMQ4XDkpYvthmiPdcNpYTwl2nAAYPeZS0d0YXfqRZlSPb6JlDc+WfZHKTZX8xvEz+17FxQTNedkPlGOFZILT2OA=; X-YMail-OSG: _XeuHewVM1l0i.ato1vfU8oxdYo4PNTgBa5IuBFt5j82R6ASaSnEXA8l9uxWpXVX1HIa56JaA6m9LuxC4hjqrjat6uFwJzLld2dxJqYV2.cnQUhiOnhbQSAsDdU.xnF7KGyddK0kRO3O_hfuo7TX_YpOAXrL3bO92PqdxX11SHpE8inJbeqbMhm9on4- Original-Received: from [64.52.12.130] by web37908.mail.mud.yahoo.com via HTTP; Mon, 16 Feb 2009 15:51:33 PST X-Mailer: YahooMailRC/1156.82 YahooMailWebService/0.7.260.1 X-detected-operating-system: by monty-python.gnu.org: FreeBSD 6.x (1) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8171 Archived-At: More observations about wide strings and Guile. First, here are the abridged call trees for low-level reading and writing. read <-+- scm_getc <-+- [the parser] <--- scm_read <--- scm_primitive_load | | | +- scm_read_char | | +- scm_c_read | +- read_without_guile write <-+- scm_lfwrite <-+- scm_display | | | +- scm_putc <-+- scm_write_char | | | +- scm_newline | +- scm_flush 1. To move to a Unicode-enabled guile, text information needs to be converted to an internal representation when read and converted back to the locale when written. Most reading and writing for ports passes through scm_getc (input) and scm_lfwrite (output). Conversion between locale strings and internal strings should happen there. 2. If string conversion occurs in scm_getc, then the scm_read reader will be receiving and parsing source code that has passed through the conversion routines. This is initially not a problem since scheme code is largely ASCII, and Guile will start up in the C locale. But, if a source code file is not ASCII, the reader needs to be able to ascertain this before parsing the code from the file. The encoding of a source code file is a property of the file and not the locale in which Guile is being run. This implies that a source code file should have syntax to indicate its own encoding, if it is not ASCII. Something akin to the line in HTML files. 3. The text encoding of a port needs to be associated with the port. R6RS has the idea of transcoders for ports that require conversion. It is daunting, but, having played some ideas for a few weeks, it seems that at least a subset of the transcoder functionality needs to be implemented for this to make any sense. I sent in my copyright assignment last week, so you should have it now. Thanks, Mike Gran