From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: need: scm_from_{utf8,latin1}_{string,symbol,keyword} Date: Mon, 06 Sep 2010 13:23:15 +0200 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1283786066 15580 80.91.229.12 (6 Sep 2010 15:14:26 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 6 Sep 2010 15:14:26 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Sep 06 17:14:24 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OsdOp-0004WA-OP for guile-devel@m.gmane.org; Mon, 06 Sep 2010 17:14:24 +0200 Original-Received: from localhost ([127.0.0.1]:57208 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OsdOp-00050K-36 for guile-devel@m.gmane.org; Mon, 06 Sep 2010 11:14:23 -0400 Original-Received: from [140.186.70.92] (port=47782 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OsdOh-00050F-3g for guile-devel@gnu.org; Mon, 06 Sep 2010 11:14:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OsdOf-0003pP-Og for guile-devel@gnu.org; Mon, 06 Sep 2010 11:14:14 -0400 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:43165 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OsdOf-0003JQ-Mp for guile-devel@gnu.org; Mon, 06 Sep 2010 11:14:13 -0400 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id CD807D3F44 for ; Mon, 6 Sep 2010 11:11:18 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to :subject:date:message-id:mime-version:content-type; s=sasl; bh=g NbCwYPIIU/1KPqBsEXrAB1teSI=; b=rztAk9poXHpq88LPTthNpKfMkdlmFP8Qk iHtaZl95HTf/4uN1MZerrurJlosVz8HowF6cwpftwwVQrtTAc1cxWtbLTeraXO3a x+YqmZddwFiZ572a48lRLBzixzQG+BVAzouPKmMMMB282fnGx1j2Ew5bkvsOnTmV /VbyCiyGUo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sasl; b=sYI DTHEN8TCGaAGNazv/ZNiyZWdTtCbWuFIrT8P9TeiHlbF+T3/lUhD9YyDCzFttMLU mCoDWNZ48judRanQfpqqbdxRfHw3p1LuB4DN9tG0MyUgaLSftVOu8KcKWUzju0KX BZnSjMyl2xR/ALmP/8W7H0gsKSDrrqgIQVbtCL5U= Original-Received: from a-pb-sasl-quonix. (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id C8950D3F42 for ; Mon, 6 Sep 2010 11:11:18 -0400 (EDT) Original-Received: from unquote.localdomain (unknown [79.156.147.138]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id 3D8ABD3F41 for ; Mon, 6 Sep 2010 11:11:15 -0400 (EDT) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) X-Pobox-Relay-ID: 003BCC0C-B9C9-11DF-8478-030CEE7EF46B-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:10873 Archived-At: Hi, In our C source, we have been trained to use scm_from_locale_string et al. This is usually the right thing to do when interacting with the operating system. However, when we have literals in C source code, I think this strategy is incorrect. I write my C source code in UTF-8 or in ISO-8859-1, but if the user is running in another locale, they will not load my strings/symbols/keywords correctly. The solution is to use functions that specify the locale. We don't have those yet, but we do have the capability to write them now. Specifically: scm_from_utf8_string scm_from_utf8_symbol scm_from_utf8_keyword scm_from_latin1_string scm_from_latin1_symbol scm_from_latin1_keyword We probably also need the "n" variants. It's unlikely that you have a known utf-32 string as a char*, but we should probably also provide scm_t_uint16* and scm_t_uint32* variants for utf16 and utf32. * * * We also have the converse problem: since the easiest (and recommended) way to get a char* from a Scheme string has been scm_to_locale_string, in many cases we give external libraries locale-encoded strings instead of the encoding they expect. For example, most GLib-based libraries expect utf-8 strings, but Guile-GNOME ignorantly passes them the result of calling scm_to_locale_string. Though this will work in UTF-8 locales, it's only by accident. So then we need, I think: scm_to_utf8_string scm_to_utf16_string scm_to_utf32_string We need the "n" variants here too (perhaps more). What do people think? Any takers on implementing this? :) Cheers, Andy -- http://wingolog.org/