* [PATCH] Fix several POSIX functions to use the locale encoding
@ 2011-05-02 0:39 Mark H Weaver
2011-05-07 20:16 ` Ludovic Courtès
0 siblings, 1 reply; 2+ messages in thread
From: Mark H Weaver @ 2011-05-02 0:39 UTC (permalink / raw)
To: guile-devel
[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]
Hello all,
GoKhlaYeh on #guile reported that although (system "echo hé") works
properly, (system* "/bin/sh" "-c" "echo hé") fails. Upon investigation,
I found that although Guile uses the locale encoding almost everywhere
when interfacing with C, several POSIX functions use Latin-1 in a few
places.
In particular: `system*', `execl', `execlp', `execle', `environ', and
`dynamic-args-call' use scm_i_allocate_string_pointers to convert a list
of SCM strings into an argv-style array of C strings. That function
does a simple memcpy for narrow strings, and throws an exception for
wide strings (via scm_i_string_chars).
`environ' was particularly broken, in that it would use Latin-1 when
setting the environment, and the locale encoding when reading the
environment. The `exec' functions would use the locale encoding to
encode the program name, and Latin-1 for the arguments and environment.
`system*' would use Latin-1 for everything.
This patch fixes all of these inconsistencies, by modifying
scm_i_allocate_string_pointers to use the locale encoding instead of
Latin-1.
Any objections to pushing this to stable-2.0?
Mark
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Fix several POSIX functions to use the locale encoding --]
[-- Type: text/x-diff, Size: 3810 bytes --]
From 00691082eb781ce9cad8da156ad170164b2cf5fc Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sun, 1 May 2011 20:12:35 -0400
Subject: [PATCH] Fix several POSIX functions to use the locale encoding
* libguile/strings.c (scm_i_allocate_string_pointers): Encode strings
using the current locale. Previously, Latin-1 was used. Indirectly,
this affects the encoding of strings in `system*', `execl', `execlp',
`execle', `environ', and `dynamic-args-call'.
(scm_makfromstrs): In header comment, clarify that the C strings are
interpreted according to the current locale encoding.
* NEWS: Add NEWS entry.
---
NEWS | 12 ++++++++++++
libguile/strings.c | 33 +++++++++++++++++++--------------
2 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/NEWS b/NEWS
index df6de65..e2495da 100644
--- a/NEWS
+++ b/NEWS
@@ -4,7 +4,19 @@ See the end for copying conditions.
Please send Guile bug reports to bug-guile@gnu.org.
+Changes in 2.0.2 (since 2.0.1):
+* Bugs fixed
+
+** Fixed several POSIX functions to use the locale encoding
+
+Fixed `system*', `execl', `execlp', `execle', `environ', and
+`dynamic-args-call' to encode all strings using the current locale
+encoding. Previously, Latin-1 was used to encode strings in several
+places, regardless of the locale.
+
+
+\f
Changes in 2.0.1 (since 2.0.0):
* Notable changes
diff --git a/libguile/strings.c b/libguile/strings.c
index bf63704..a70cb8d 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -2052,8 +2052,9 @@ SCM_DEFINE (scm_string_normalize_nfkd, "string-normalize-nfkd", 1, 0, 0,
}
#undef FUNC_NAME
-/* converts C scm_array of strings to SCM scm_list of strings. */
-/* If argc < 0, a null terminated scm_array is assumed. */
+/* converts C scm_array of strings to SCM scm_list of strings.
+ If argc < 0, a null terminated scm_array is assumed.
+ The current locale encoding is assumed */
SCM
scm_makfromstrs (int argc, char **argv)
{
@@ -2067,37 +2068,41 @@ scm_makfromstrs (int argc, char **argv)
}
/* Return a newly allocated array of char pointers to each of the strings
- in args, with a terminating NULL pointer. */
-
+ in args, with a terminating NULL pointer. The strings are encoded using
+ the current locale. */
char **
scm_i_allocate_string_pointers (SCM list)
#define FUNC_NAME "scm_i_allocate_string_pointers"
{
char **result;
- int len = scm_ilength (list);
+ int list_len = scm_ilength (list);
int i;
- if (len < 0)
+ if (list_len < 0)
scm_wrong_type_arg_msg (NULL, 0, list, "proper list");
- result = scm_gc_malloc ((len + 1) * sizeof (char *),
+ result = scm_gc_malloc ((list_len + 1) * sizeof (char *),
"string pointers");
- result[len] = NULL;
+ result[list_len] = NULL;
/* The list might be have been modified in another thread, so
we check LIST before each access.
*/
- for (i = 0; i < len && scm_is_pair (list); i++)
+ for (i = 0; i < list_len && scm_is_pair (list); i++)
{
- SCM str;
+ SCM str = SCM_CAR (list);
size_t len;
+ char *c_str = scm_to_locale_stringn (str, &len);
- str = SCM_CAR (list);
- len = scm_c_string_length (str);
-
+ /* OPTIMIZE-ME: Right now, scm_to_locale_stringn always uses
+ scm_malloc to allocate the returned string, which must be
+ explicitly deallocated. This forces us to copy the string a
+ second time into a new buffer. Ideally there would be variants
+ of scm_to_*_stringn that can return garbage-collected buffers. */
result[i] = scm_gc_malloc_pointerless (len + 1, "string pointers");
- memcpy (result[i], scm_i_string_chars (str), len);
+ memcpy (result[i], c_str, len);
result[i][len] = '\0';
+ free (c_str);
list = SCM_CDR (list);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] Fix several POSIX functions to use the locale encoding
2011-05-02 0:39 [PATCH] Fix several POSIX functions to use the locale encoding Mark H Weaver
@ 2011-05-07 20:16 ` Ludovic Courtès
0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2011-05-07 20:16 UTC (permalink / raw)
To: guile-devel
Hi Mark,
Mark H Weaver <mhw@netris.org> writes:
> In particular: `system*', `execl', `execlp', `execle', `environ', and
> `dynamic-args-call' use scm_i_allocate_string_pointers to convert a list
> of SCM strings into an argv-style array of C strings. That function
> does a simple memcpy for narrow strings, and throws an exception for
> wide strings (via scm_i_string_chars).
>
> `environ' was particularly broken, in that it would use Latin-1 when
> setting the environment, and the locale encoding when reading the
> environment. The `exec' functions would use the locale encoding to
> encode the program name, and Latin-1 for the arguments and environment.
> `system*' would use Latin-1 for everything.
Ouch, good catch!
> This patch fixes all of these inconsistencies, by modifying
> scm_i_allocate_string_pointers to use the locale encoding instead of
> Latin-1.
>
> Any objections to pushing this to stable-2.0?
Not from me, please go ahead!
It would be ideal if you could add a test case, say, for ‘system*’ and
‘environ’. You could use the ‘with-latin1-locale’ tricks.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-05-07 20:16 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-02 0:39 [PATCH] Fix several POSIX functions to use the locale encoding Mark H Weaver
2011-05-07 20:16 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).