From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stephen Compall Newsgroups: gmane.lisp.guile.devel Subject: Which Encoding? (was Re: Unicode and Guile) Date: 26 Oct 2003 12:34:47 +0000 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: References: <20031021171534.GA13246@lark> <200310260003.RAA10375@morrowfield.regexps.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1067172190 10316 80.91.224.253 (26 Oct 2003 12:43:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 26 Oct 2003 12:43:10 +0000 (UTC) Cc: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Oct 26 13:43:07 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ADkEd-0004aW-00 for ; Sun, 26 Oct 2003 13:43:07 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ADkDY-0000NL-4N for guile-devel@m.gmane.org; Sun, 26 Oct 2003 07:42:00 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ADkCe-0008LX-Ck for guile-devel@gnu.org; Sun, 26 Oct 2003 07:41:04 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ADkC2-0007uN-KR for guile-devel@gnu.org; Sun, 26 Oct 2003 07:40:58 -0500 Original-Received: from [192.195.228.35] (helo=csserver.evansville.edu) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.24) id 1ADkBY-00075w-Q1 for guile-devel@gnu.org; Sun, 26 Oct 2003 07:39:56 -0500 Original-Received: from csserver.evansville.edu (localhost.localdomain [127.0.0.1]) by csserver.evansville.edu (8.12.8/8.12.8) with ESMTP id h9QCYmBT005292; Sun, 26 Oct 2003 06:34:48 -0600 Original-Received: (from sc87@localhost) by csserver.evansville.edu (8.12.8/8.12.8/Submit) id h9QCYlie005288; Sun, 26 Oct 2003 12:34:47 GMT X-Authentication-Warning: csserver.evansville.edu: sc87 set sender to s11@member.fsf.org using -f Original-To: Tom Lord In-Reply-To: <200310260003.RAA10375@morrowfield.regexps.com> Original-Lines: 40 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:2924 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:2924 Tom Lord writes: > It's culturually discriminatory to regard utf-16 as worse than utf-8 > in those regards. > > Or, put differently, for many potential users, utf-16 is the best of > both worlds: it optimizes the size of the most common characters > (for some users), and it can also handle any Unicode character. That's the thing -- it can't, at least not thinking in fixed-width terms, which was my goal in suggesting UCS-4. It may be able to handle all *current* Unicode characters, but what about those in the future? Unicode supports code points higher than 16-bit. I say it's the worst of both worlds (from the C API user's point of view), because you have to deal with breaking ASCII compatibility for 7-bit code points, *and* still need surrogate characters (i.e. variable width), for code points above 65535 (the difference between UTF-16 and UCS-2). UTF-16 suffers the same problem as UTF-8: programmers may be tempted to simply treat the data block as fixed-width 16-bit strings (8-bit for UTF-8, of course), which of course will break on the surrogate characters. If you want to assume that Unicode will never grow out of the 16-bit set, then UCS-2 would be a much better choice than UTF-16, IMHO. That way, it is clear that C programs only need deal with fixed-width, 16-bit characters. -- Stephen Compall or s11 or sirian Since a politician never believes what he says, he is surprised when others believe him. -- Charles DeGaulle Ft. Meade Lexis-Nexis smuggle virus BROMURE JSOFC3IP emc plutonium electronic surveillance quarter number key offensive information warfare fraud Albania Khaddafi _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel