From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: Using libunistring for string comparisons et al Date: Tue, 15 Mar 2011 13:20:54 -0400 Message-ID: <878vwgmhah.fsf@netris.org> References: <336042.33326.qm@web37901.mail.mud.yahoo.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1300209698 13905 80.91.229.12 (15 Mar 2011 17:21:38 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 15 Mar 2011 17:21:38 +0000 (UTC) Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , guile-devel@gnu.org To: Mike Gran Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Mar 15 18:21:32 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PzXvx-0003Hq-7w for guile-devel@m.gmane.org; Tue, 15 Mar 2011 18:21:25 +0100 Original-Received: from localhost ([127.0.0.1]:41288 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PzXvw-0002l8-L1 for guile-devel@m.gmane.org; Tue, 15 Mar 2011 13:21:24 -0400 Original-Received: from [140.186.70.92] (port=40433 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PzXvn-0002k1-4i for guile-devel@gnu.org; Tue, 15 Mar 2011 13:21:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PzXvm-0006uL-0K for guile-devel@gnu.org; Tue, 15 Mar 2011 13:21:14 -0400 Original-Received: from world.peace.net ([96.39.62.75]:53833) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PzXvk-0006tc-IM; Tue, 15 Mar 2011 13:21:12 -0400 Original-Received: from 209-6-93-251.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.93.251] helo=freedomincluded) by world.peace.net with esmtpa (Exim 4.69) (envelope-from ) id 1PzXvf-0006Jd-A2; Tue, 15 Mar 2011 13:21:07 -0400 Original-Received: from mhw by freedomincluded with local (Exim 4.69) (envelope-from ) id 1PzXvT-0001ig-1h; Tue, 15 Mar 2011 13:20:55 -0400 In-Reply-To: <336042.33326.qm@web37901.mail.mud.yahoo.com> (Mike Gran's message of "Sat, 12 Mar 2011 13:28:05 -0800 (PST)") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:11873 Archived-At: Mike Gran writes: > We do, in a matter of speaking, have a single string representation: > UTF-32. The 'narrow' encoding is UTF-32 with the initial 3 bytes of > zero removed. Despite the similarity of these two representations, they are sufficiently different that they cannot be handled by the same machine code. That means you must either implement multiple inner loops, one for each combination of string parameter representations, or else you must dispatch on the string representation within the inner loop. On modern architectures, wrongly predicted conditional branches are very expensive. > I actually at one point had a nearly complete version of Guile 1.8 > that used UTF-8 and another that used UTF-32.=C2=A0 There are some > other reasons why UTF-8 is bad, which I could bore you with > ad naseum. Can you please tell me why UTF-8 is bad, or point me to something that explains it? Everything I have found suggests that UTF-8 is very good. Thanks, Mark