From: Mike Gran <spk121@yahoo.com>
To: guile-devel <guile-devel@gnu.org>
Subject: Unicode, ports and encoding
Date: Mon, 16 Feb 2009 15:51:33 -0800 (PST) [thread overview]
Message-ID: <550226.89448.qm@web37908.mail.mud.yahoo.com> (raw)
More observations about wide strings and Guile.
First, here are the abridged call trees for low-level reading and
writing.
read <-+- scm_getc <-+- [the parser] <--- scm_read <--- scm_primitive_load
| |
| +- scm_read_char
|
|
+- scm_c_read
|
+- read_without_guile
write <-+- scm_lfwrite <-+- scm_display
| |
| +- scm_putc <-+- scm_write_char
| |
| +- scm_newline
|
+- scm_flush
1. To move to a Unicode-enabled guile, text information needs to be
converted to an internal representation when read and converted
back to the locale when written. Most reading and writing for
ports passes through scm_getc (input) and scm_lfwrite (output).
Conversion between locale strings and internal strings should
happen there.
2. If string conversion occurs in scm_getc, then the scm_read reader
will be receiving and parsing source code that has passed through
the conversion routines. This is initially not a problem since
scheme code is largely ASCII, and Guile will start up in the C
locale.
But, if a source code file is not ASCII, the reader needs to be
able to ascertain this before parsing the code from the file. The
encoding of a source code file is a property of the file and not
the locale in which Guile is being run.
This implies that a source code file should have syntax to
indicate its own encoding, if it is not ASCII. Something akin to
the <?xml encoding="utf-8"?> line in HTML files.
3. The text encoding of a port needs to be associated with the port.
R6RS has the idea of transcoders for ports that require
conversion. It is daunting, but, having played some ideas for a
few weeks, it seems that at least a subset of the transcoder
functionality needs to be implemented for this to make any sense.
I sent in my copyright assignment last week, so you should have it
now.
Thanks,
Mike Gran
next reply other threads:[~2009-02-16 23:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-16 23:51 Mike Gran [this message]
2009-02-17 21:54 ` Unicode, ports and encoding Ludovic Courtès
2009-02-17 23:45 ` Mike Gran
2009-02-18 8:48 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=550226.89448.qm@web37908.mail.mud.yahoo.com \
--to=spk121@yahoo.com \
--cc=guile-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).