From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: [PATCH] Fix several POSIX functions to use the locale encoding Date: Sun, 01 May 2011 20:39:55 -0400 Message-ID: <87sjsyszok.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1304296823 32185 80.91.229.12 (2 May 2011 00:40:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 2 May 2011 00:40:23 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon May 02 02:40:19 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QGhBT-0002ix-0v for guile-devel@m.gmane.org; Mon, 02 May 2011 02:40:19 +0200 Original-Received: from localhost ([::1]:55810 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGhBS-0003Pk-Iy for guile-devel@m.gmane.org; Sun, 01 May 2011 20:40:18 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:55600) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGhBP-0003PK-3c for guile-devel@gnu.org; Sun, 01 May 2011 20:40:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QGhBN-00013S-R1 for guile-devel@gnu.org; Sun, 01 May 2011 20:40:15 -0400 Original-Received: from world.peace.net ([96.39.62.75]:44681) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QGhBN-000139-Nd for guile-devel@gnu.org; Sun, 01 May 2011 20:40:13 -0400 Original-Received: from 209-6-39-128.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.39.128] helo=freedomincluded) by world.peace.net with esmtpa (Exim 4.69) (envelope-from ) id 1QGhBH-0001jR-9d; Sun, 01 May 2011 20:40:07 -0400 Original-Received: from mhw by freedomincluded with local (Exim 4.69) (envelope-from ) id 1QGhB5-0008Dz-PV; Sun, 01 May 2011 20:39:55 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:12398 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello all, GoKhlaYeh on #guile reported that although (system "echo h=C3=A9") works properly, (system* "/bin/sh" "-c" "echo h=C3=A9") fails. Upon investigatio= n, I found that although Guile uses the locale encoding almost everywhere when interfacing with C, several POSIX functions use Latin-1 in a few places. In particular: `system*', `execl', `execlp', `execle', `environ', and `dynamic-args-call' use scm_i_allocate_string_pointers to convert a list of SCM strings into an argv-style array of C strings. That function does a simple memcpy for narrow strings, and throws an exception for wide strings (via scm_i_string_chars). `environ' was particularly broken, in that it would use Latin-1 when setting the environment, and the locale encoding when reading the environment. The `exec' functions would use the locale encoding to encode the program name, and Latin-1 for the arguments and environment. `system*' would use Latin-1 for everything. This patch fixes all of these inconsistencies, by modifying scm_i_allocate_string_pointers to use the locale encoding instead of Latin-1. Any objections to pushing this to stable-2.0? Mark --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Fix-several-POSIX-functions-to-use-the-locale-encodi.patch Content-Description: Fix several POSIX functions to use the locale encoding >From 00691082eb781ce9cad8da156ad170164b2cf5fc Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Sun, 1 May 2011 20:12:35 -0400 Subject: [PATCH] Fix several POSIX functions to use the locale encoding * libguile/strings.c (scm_i_allocate_string_pointers): Encode strings using the current locale. Previously, Latin-1 was used. Indirectly, this affects the encoding of strings in `system*', `execl', `execlp', `execle', `environ', and `dynamic-args-call'. (scm_makfromstrs): In header comment, clarify that the C strings are interpreted according to the current locale encoding. * NEWS: Add NEWS entry. --- NEWS | 12 ++++++++++++ libguile/strings.c | 33 +++++++++++++++++++-------------- 2 files changed, 31 insertions(+), 14 deletions(-) diff --git a/NEWS b/NEWS index df6de65..e2495da 100644 --- a/NEWS +++ b/NEWS @@ -4,7 +4,19 @@ See the end for copying conditions. Please send Guile bug reports to bug-guile@gnu.org. +Changes in 2.0.2 (since 2.0.1): +* Bugs fixed + +** Fixed several POSIX functions to use the locale encoding + +Fixed `system*', `execl', `execlp', `execle', `environ', and +`dynamic-args-call' to encode all strings using the current locale +encoding. Previously, Latin-1 was used to encode strings in several +places, regardless of the locale. + + + Changes in 2.0.1 (since 2.0.0): * Notable changes diff --git a/libguile/strings.c b/libguile/strings.c index bf63704..a70cb8d 100644 --- a/libguile/strings.c +++ b/libguile/strings.c @@ -2052,8 +2052,9 @@ SCM_DEFINE (scm_string_normalize_nfkd, "string-normalize-nfkd", 1, 0, 0, } #undef FUNC_NAME -/* converts C scm_array of strings to SCM scm_list of strings. */ -/* If argc < 0, a null terminated scm_array is assumed. */ +/* converts C scm_array of strings to SCM scm_list of strings. + If argc < 0, a null terminated scm_array is assumed. + The current locale encoding is assumed */ SCM scm_makfromstrs (int argc, char **argv) { @@ -2067,37 +2068,41 @@ scm_makfromstrs (int argc, char **argv) } /* Return a newly allocated array of char pointers to each of the strings - in args, with a terminating NULL pointer. */ - + in args, with a terminating NULL pointer. The strings are encoded using + the current locale. */ char ** scm_i_allocate_string_pointers (SCM list) #define FUNC_NAME "scm_i_allocate_string_pointers" { char **result; - int len = scm_ilength (list); + int list_len = scm_ilength (list); int i; - if (len < 0) + if (list_len < 0) scm_wrong_type_arg_msg (NULL, 0, list, "proper list"); - result = scm_gc_malloc ((len + 1) * sizeof (char *), + result = scm_gc_malloc ((list_len + 1) * sizeof (char *), "string pointers"); - result[len] = NULL; + result[list_len] = NULL; /* The list might be have been modified in another thread, so we check LIST before each access. */ - for (i = 0; i < len && scm_is_pair (list); i++) + for (i = 0; i < list_len && scm_is_pair (list); i++) { - SCM str; + SCM str = SCM_CAR (list); size_t len; + char *c_str = scm_to_locale_stringn (str, &len); - str = SCM_CAR (list); - len = scm_c_string_length (str); - + /* OPTIMIZE-ME: Right now, scm_to_locale_stringn always uses + scm_malloc to allocate the returned string, which must be + explicitly deallocated. This forces us to copy the string a + second time into a new buffer. Ideally there would be variants + of scm_to_*_stringn that can return garbage-collected buffers. */ result[i] = scm_gc_malloc_pointerless (len + 1, "string pointers"); - memcpy (result[i], scm_i_string_chars (str), len); + memcpy (result[i], c_str, len); result[i][len] = '\0'; + free (c_str); list = SCM_CDR (list); } -- 1.7.1 --=-=-=--