From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Re: Wide strings Date: Wed, 28 Jan 2009 19:36:17 +0100 Message-ID: References: <470889.75847.qm@web37904.mail.mud.yahoo.com> <87wscjvwyq.fsf@gnu.org> <437818.2998.qm@web37907.mail.mud.yahoo.com> <87pri9lpab.fsf@gnu.org> <142660.24551.qm@web37906.mail.mud.yahoo.com> <87ljswk21l.fsf@gnu.org> <591698.58378.qm@web37905.mail.mud.yahoo.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1233167929 3862 80.91.229.12 (28 Jan 2009 18:38:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 28 Jan 2009 18:38:49 +0000 (UTC) Cc: guile-devel@gnu.org To: Mike Gran Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Jan 28 19:40:00 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LSFKL-0004TR-OB for guile-devel@m.gmane.org; Wed, 28 Jan 2009 19:39:54 +0100 Original-Received: from localhost ([127.0.0.1]:47026 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LSFJ3-0001ja-Hy for guile-devel@m.gmane.org; Wed, 28 Jan 2009 13:38:33 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LSFHM-0001Hb-CJ for guile-devel@gnu.org; Wed, 28 Jan 2009 13:36:48 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LSFHK-0001Gs-P3 for guile-devel@gnu.org; Wed, 28 Jan 2009 13:36:47 -0500 Original-Received: from [199.232.76.173] (port=41159 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LSFHK-0001Gk-Ep for guile-devel@gnu.org; Wed, 28 Jan 2009 13:36:46 -0500 Original-Received: from a-sasl-fastnet.sasl.smtp.pobox.com ([207.106.133.19]:43392 helo=sasl.smtp.pobox.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LSFHK-0003yl-5I for guile-devel@gnu.org; Wed, 28 Jan 2009 13:36:46 -0500 Original-Received: from localhost.localdomain (unknown [127.0.0.1]) by a-sasl-fastnet.sasl.smtp.pobox.com (Postfix) with ESMTP id EA71C94146; Wed, 28 Jan 2009 13:36:45 -0500 (EST) Original-Received: from unquote (unknown [82.123.182.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-sasl-fastnet.sasl.smtp.pobox.com (Postfix) with ESMTPSA id 13E0F94144; Wed, 28 Jan 2009 13:36:43 -0500 (EST) In-Reply-To: <591698.58378.qm@web37905.mail.mud.yahoo.com> (Mike Gran's message of "Wed, 28 Jan 2009 08:44:15 -0800 (PST)") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-Pobox-Relay-ID: 9DB91502-ED6A-11DD-A6C9-CC4CC92D7133-02397024!a-sasl-fastnet.pobox.com X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8102 Archived-At: Hi, On Wed 28 Jan 2009 17:44, Mike Gran writes: > Since I need this functionality taken care of, and since I have some > time to play with it, what's the procedure here? The best thing IMO would be to hack on it on a Git branch, with small and correct patches. We could get you commit access if you don't already have it (Ludo or Neil would have to reply on that). Then you could push your work directly to a branch, so we all can review it easily. > Do we need to talk more about what needs to be accomplished? Do we > need a complete specification? Do we need a vote on if it is a good > idea? I think you're going in the right direction. More importantly, although I can't speak for them, Neil and Ludo seem to think so too. > 1. Convert the internal char and string representation to be > explicitly ISO 8859-1. Add the to/from locale conversion functionality > while still retaining 8-bit strings. Replace C library funcs with > Gnulib string funcs where appropriate. Sounds appropriate to me. I am unfamiliar with the gnulib code; where do the unicode codepoit tables live? How does one update them? Do we get full introspection on characters and their classes, properties, etc? > 2. Convert the internal representation of chars to 4-byte > codepoints, while still retaining 8-bit strings. Currently, characters are immediate values, with an 8-bit tag. See tags.h:333. So it seems we have 24 bits remaining, and unicode claims that 21 bits are the minimum necessary -- so we're good, if you can figure out a reasonable way to go from a 32-bit codepoint to a 24-bit codepoint. > 3. Convert strings to be a union of 1 byte and 4 byte chars. There's room on stringbufs to have a flag, I think. Dunno if that's the right way to do it. Converting the symbols and keywords code to do the right thing will be a little bit of work, too. Happy hacking, Andy -- http://wingolog.org/