unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Stephen Compall <s11@member.fsf.org>
Cc: guile-devel@gnu.org
Subject: Which Encoding? (was Re: Unicode and Guile)
Date: 26 Oct 2003 12:34:47 +0000	[thread overview]
Message-ID: <xfyd6ckb2jc.fsf_-_@csserver.evansville.edu> (raw)
In-Reply-To: <200310260003.RAA10375@morrowfield.regexps.com>

Tom Lord <lord@emf.net> writes:

> It's culturually discriminatory to regard utf-16 as worse than utf-8
> in those regards.
> 
> Or, put differently, for many potential users, utf-16 is the best of
> both worlds: it optimizes the size of the most common characters
> (for some users), and it can also handle any Unicode character.

That's the thing -- it can't, at least not thinking in fixed-width
terms, which was my goal in suggesting UCS-4.  It may be able to
handle all *current* Unicode characters, but what about those in the
future?  Unicode supports code points higher than 16-bit.

I say it's the worst of both worlds (from the C API user's point of
view), because you have to deal with breaking ASCII compatibility for
7-bit code points, *and* still need surrogate characters
(i.e. variable width), for code points above 65535 (the difference
between UTF-16 and UCS-2).

UTF-16 suffers the same problem as UTF-8: programmers may be tempted
to simply treat the data block as fixed-width 16-bit strings (8-bit
for UTF-8, of course), which of course will break on the surrogate
characters.

If you want to assume that Unicode will never grow out of the 16-bit
set, then UCS-2 would be a much better choice than UTF-16, IMHO.  That
way, it is clear that C programs only need deal with fixed-width,
16-bit characters.

--
Stephen Compall or s11 or sirian

Since a politician never believes what he says, he is surprised
when others believe him.
		-- Charles DeGaulle

Ft. Meade Lexis-Nexis smuggle virus BROMURE JSOFC3IP emc plutonium
electronic surveillance quarter number key offensive information
warfare fraud Albania Khaddafi


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


  reply	other threads:[~2003-10-26 12:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-21 17:15 Unicode and Guile Andy Wingo
2003-10-25 17:08 ` Stephen Compall
2003-10-26  0:03   ` Tom Lord
2003-10-26 12:34     ` Stephen Compall [this message]
2003-10-31 13:25     ` Andy Wingo
2003-11-03 13:35       ` text buffers (was Re: Unicode and Guile) Stephen Compall
2003-11-03 20:34         ` Tom Lord
2003-11-04 10:04           ` Stephen Compall
2003-11-03 20:31       ` Unicode and Guile Tom Lord
2003-11-06 18:16         ` Andy Wingo
2003-11-11 19:02           ` Tom Lord
2003-11-12  0:29             ` Marius Vollmer
2003-11-12  1:40               ` Tom Lord
2003-11-12  2:30                 ` Marius Vollmer
2003-11-12  4:03                   ` Tom Lord
2003-11-12 16:59                     ` Marius Vollmer
2003-11-17 16:17             ` Andy Wingo
2003-11-12  0:06           ` Marius Vollmer
2003-11-12  1:27             ` Tom Lord
2003-10-31 13:16   ` Andy Wingo
2003-11-02 21:23 ` Kevin Ryde
2003-11-26 20:35 ` Mikael Djurfeldt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xfyd6ckb2jc.fsf_-_@csserver.evansville.edu \
    --to=s11@member.fsf.org \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).