From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: Using libunistring for string comparisons et al Date: Thu, 17 Mar 2011 11:38:01 -0400 Message-ID: <87tyf1kbae.fsf@netris.org> References: <336042.33326.qm@web37901.mail.mud.yahoo.com> <878vwgmhah.fsf@netris.org> <511668.33680.qm@web37902.mail.mud.yahoo.com> <87sjuokniq.fsf@netris.org> <118142.11911.qm@web37907.mail.mud.yahoo.com> <87ipvjlvgj.fsf@netris.org> <87oc5b8fx3.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1300379932 7668 80.91.229.12 (17 Mar 2011 16:38:52 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 17 Mar 2011 16:38:52 +0000 (UTC) Cc: guile-devel@gnu.org To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Mar 17 17:38:48 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q0GDo-0002A5-2l for guile-devel@m.gmane.org; Thu, 17 Mar 2011 17:38:48 +0100 Original-Received: from localhost ([127.0.0.1]:33698 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0FyL-00009H-VV for guile-devel@m.gmane.org; Thu, 17 Mar 2011 12:22:50 -0400 Original-Received: from [140.186.70.92] (port=54912 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0FHY-0002zJ-MU for guile-devel@gnu.org; Thu, 17 Mar 2011 11:38:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q0FHU-0000ZX-HQ for guile-devel@gnu.org; Thu, 17 Mar 2011 11:38:33 -0400 Original-Received: from world.peace.net ([96.39.62.75]:34808) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q0FHS-0000WW-Df; Thu, 17 Mar 2011 11:38:30 -0400 Original-Received: from 209-6-93-251.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.93.251] helo=freedomincluded) by world.peace.net with esmtpa (Exim 4.69) (envelope-from ) id 1Q0FHC-0005fh-8t; Thu, 17 Mar 2011 11:38:14 -0400 Original-Received: from mhw by freedomincluded with local (Exim 4.69) (envelope-from ) id 1Q0FGz-0002Rq-Tw; Thu, 17 Mar 2011 11:38:01 -0400 In-Reply-To: <87oc5b8fx3.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Wed, 16 Mar 2011 12:26:32 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:11889 Archived-At: I have a compromise proposal, which could be implemented for 2.0.x: We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs to UTF-8, along with a flag that indicates whether it is known to be ASCII-only. Applying string-ref or string-set! to a narrow stringbuf would upgrade it to a wide stringbuf, unless it is known to be ASCII-only. Better yet, string-ref should do this only when the index is above a certain threshold value, and string-set! should do this only for stringbufs longer than a certain threshold length. This would keep our accessors O(1), but also ensure that most stringbufs are narrow. This is important not only for optimal memory usage, but also because it means we don't have to worry so much about optimizing the narrow-wide cases: then we can handle those cases by widening or narrowing to make them the same width, and then calling libunistring. In the eventual common case, where string-ref and string-set! are rarely called, almost all stringbufs would be narrow, so converting to UTF-8 becomes an O(1) operation. What do you think? Mark