unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Fix several POSIX functions to use the locale encoding
@ 2011-05-02  0:39 Mark H Weaver
  2011-05-07 20:16 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Mark H Weaver @ 2011-05-02  0:39 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

Hello all,

GoKhlaYeh on #guile reported that although (system "echo hé") works
properly, (system* "/bin/sh" "-c" "echo hé") fails.  Upon investigation,
I found that although Guile uses the locale encoding almost everywhere
when interfacing with C, several POSIX functions use Latin-1 in a few
places.

In particular: `system*', `execl', `execlp', `execle', `environ', and
`dynamic-args-call' use scm_i_allocate_string_pointers to convert a list
of SCM strings into an argv-style array of C strings.  That function
does a simple memcpy for narrow strings, and throws an exception for
wide strings (via scm_i_string_chars).

`environ' was particularly broken, in that it would use Latin-1 when
setting the environment, and the locale encoding when reading the
environment.  The `exec' functions would use the locale encoding to
encode the program name, and Latin-1 for the arguments and environment.
`system*' would use Latin-1 for everything.

This patch fixes all of these inconsistencies, by modifying
scm_i_allocate_string_pointers to use the locale encoding instead of
Latin-1.

Any objections to pushing this to stable-2.0?

      Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Fix several POSIX functions to use the locale encoding --]
[-- Type: text/x-diff, Size: 3810 bytes --]

From 00691082eb781ce9cad8da156ad170164b2cf5fc Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sun, 1 May 2011 20:12:35 -0400
Subject: [PATCH] Fix several POSIX functions to use the locale encoding

* libguile/strings.c (scm_i_allocate_string_pointers): Encode strings
  using the current locale.  Previously, Latin-1 was used.  Indirectly,
  this affects the encoding of strings in `system*', `execl', `execlp',
  `execle', `environ', and `dynamic-args-call'.

  (scm_makfromstrs): In header comment, clarify that the C strings are
  interpreted according to the current locale encoding.

* NEWS: Add NEWS entry.
---
 NEWS               |   12 ++++++++++++
 libguile/strings.c |   33 +++++++++++++++++++--------------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/NEWS b/NEWS
index df6de65..e2495da 100644
--- a/NEWS
+++ b/NEWS
@@ -4,7 +4,19 @@ See the end for copying conditions.
 
 Please send Guile bug reports to bug-guile@gnu.org.
 
+Changes in 2.0.2 (since 2.0.1):
 
+* Bugs fixed
+
+** Fixed several POSIX functions to use the locale encoding
+
+Fixed `system*', `execl', `execlp', `execle', `environ', and
+`dynamic-args-call' to encode all strings using the current locale
+encoding.  Previously, Latin-1 was used to encode strings in several
+places, regardless of the locale.
+
+
+\f
 Changes in 2.0.1 (since 2.0.0):
 
 * Notable changes
diff --git a/libguile/strings.c b/libguile/strings.c
index bf63704..a70cb8d 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -2052,8 +2052,9 @@ SCM_DEFINE (scm_string_normalize_nfkd, "string-normalize-nfkd", 1, 0, 0,
 }
 #undef FUNC_NAME
 
-/* converts C scm_array of strings to SCM scm_list of strings. */
-/* If argc < 0, a null terminated scm_array is assumed. */
+/* converts C scm_array of strings to SCM scm_list of strings.
+   If argc < 0, a null terminated scm_array is assumed.
+   The current locale encoding is assumed */
 SCM
 scm_makfromstrs (int argc, char **argv)
 {
@@ -2067,37 +2068,41 @@ scm_makfromstrs (int argc, char **argv)
 }
 
 /* Return a newly allocated array of char pointers to each of the strings
-   in args, with a terminating NULL pointer.  */
-
+   in args, with a terminating NULL pointer.  The strings are encoded using
+   the current locale. */
 char **
 scm_i_allocate_string_pointers (SCM list)
 #define FUNC_NAME "scm_i_allocate_string_pointers"
 {
   char **result;
-  int len = scm_ilength (list);
+  int list_len = scm_ilength (list);
   int i;
 
-  if (len < 0)
+  if (list_len < 0)
     scm_wrong_type_arg_msg (NULL, 0, list, "proper list");
 
-  result = scm_gc_malloc ((len + 1) * sizeof (char *),
+  result = scm_gc_malloc ((list_len + 1) * sizeof (char *),
 			  "string pointers");
-  result[len] = NULL;
+  result[list_len] = NULL;
 
   /* The list might be have been modified in another thread, so
      we check LIST before each access.
    */
-  for (i = 0; i < len && scm_is_pair (list); i++)
+  for (i = 0; i < list_len && scm_is_pair (list); i++)
     {
-      SCM str;
+      SCM str = SCM_CAR (list);
       size_t len;
+      char *c_str = scm_to_locale_stringn (str, &len);
 
-      str = SCM_CAR (list);
-      len = scm_c_string_length (str);
-
+      /* OPTIMIZE-ME: Right now, scm_to_locale_stringn always uses
+	 scm_malloc to allocate the returned string, which must be
+	 explicitly deallocated.  This forces us to copy the string a
+	 second time into a new buffer.  Ideally there would be variants
+	 of scm_to_*_stringn that can return garbage-collected buffers. */
       result[i] = scm_gc_malloc_pointerless (len + 1, "string pointers");
-      memcpy (result[i], scm_i_string_chars (str), len);
+      memcpy (result[i], c_str, len);
       result[i][len] = '\0';
+      free (c_str);
 
       list = SCM_CDR (list);
     }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] Fix several POSIX functions to use the locale encoding
  2011-05-02  0:39 [PATCH] Fix several POSIX functions to use the locale encoding Mark H Weaver
@ 2011-05-07 20:16 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2011-05-07 20:16 UTC (permalink / raw)
  To: guile-devel

Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

> In particular: `system*', `execl', `execlp', `execle', `environ', and
> `dynamic-args-call' use scm_i_allocate_string_pointers to convert a list
> of SCM strings into an argv-style array of C strings.  That function
> does a simple memcpy for narrow strings, and throws an exception for
> wide strings (via scm_i_string_chars).
>
> `environ' was particularly broken, in that it would use Latin-1 when
> setting the environment, and the locale encoding when reading the
> environment.  The `exec' functions would use the locale encoding to
> encode the program name, and Latin-1 for the arguments and environment.
> `system*' would use Latin-1 for everything.

Ouch, good catch!

> This patch fixes all of these inconsistencies, by modifying
> scm_i_allocate_string_pointers to use the locale encoding instead of
> Latin-1.
>
> Any objections to pushing this to stable-2.0?

Not from me, please go ahead!

It would be ideal if you could add a test case, say, for ‘system*’ and
‘environ’.  You could use the ‘with-latin1-locale’ tricks.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-05-07 20:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-02  0:39 [PATCH] Fix several POSIX functions to use the locale encoding Mark H Weaver
2011-05-07 20:16 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).