unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#31343: scm_c_primitive_load behavior/documentation bug
@ 2018-05-02 17:35 Tom Balzer
  2018-05-28 12:55 ` Mark H Weaver
  0 siblings, 1 reply; 2+ messages in thread
From: Tom Balzer @ 2018-05-02 17:35 UTC (permalink / raw)
  To: 31343


Hello -

In ./guile/libguile/load.c, the function scm_c_primitive_load converts a
c string to a SCM string via scm_from_locale_string. I was reading the
manual and in section 6.6.5.14, it says:

> C Function: SCM scm_from_locale_string (const char *str)
> C Function: SCM scm_from_locale_stringn (const char *str, size_t
>
> [...]
>
> Note that these functions should _not_ be used to convert C string
> constants, because there is no guarantee that the current locale
> will match that of the execution character set, used for string and
> character constants.  Most modern C compilers use UTF-8 by default,
> so to convert C string constants we recommend
> ‘scm_from_utf8_string’.

This implies to me that you should not use scm_c_primitive_load with any
constant, like this:

#include <libguile.h>
#include <stdlib.h>

#define FILE "/home/niebie/sc/sdl/states.scm"

void *some_func(void *arg){
  SCM scm_c_primitive_load(FILE);
  
  return NULL;
}

int main(int argc, char **argv){
  void *res = scm_with_guile(some_func, NULL);
  return EXIT_SUCCESS;
}

I saw this only by reading the source for this function, as from the
documentation it isn't obvious. I am sending this to bug-guile because I
think that either this is a documentation bug or an implementation
bug. In either case I am happy to send a patch that fixes whichever is
at fault.

A counter example is scm_c_public_variable in modules.c, which uses
scm_from_utf8_symbol on the inputs, which precludes the use of dynamic c
strings for the input. Again, not something documented. I would think
both of these functions would do things the same way.

Thanks,
Tom





^ permalink raw reply	[flat|nested] 2+ messages in thread

* bug#31343: scm_c_primitive_load behavior/documentation bug
  2018-05-02 17:35 bug#31343: scm_c_primitive_load behavior/documentation bug Tom Balzer
@ 2018-05-28 12:55 ` Mark H Weaver
  0 siblings, 0 replies; 2+ messages in thread
From: Mark H Weaver @ 2018-05-28 12:55 UTC (permalink / raw)
  To: Tom Balzer; +Cc: 31343

Hi Tom,

Sorry for taking so long to respond.

Tom Balzer <niebieskitrociny@gmail.com> writes:

> In ./guile/libguile/load.c, the function scm_c_primitive_load converts a
> c string to a SCM string via scm_from_locale_string. I was reading the
> manual and in section 6.6.5.14, it says:
>
>> C Function: SCM scm_from_locale_string (const char *str)
>> C Function: SCM scm_from_locale_stringn (const char *str, size_t
>>
>> [...]
>>
>> Note that these functions should _not_ be used to convert C string
>> constants, because there is no guarantee that the current locale
>> will match that of the execution character set, used for string and
>> character constants.  Most modern C compilers use UTF-8 by default,
>> so to convert C string constants we recommend
>> ‘scm_from_utf8_string’.
>
> This implies to me that you should not use scm_c_primitive_load with any
> constant, like this:
>
> #include <libguile.h>
> #include <stdlib.h>
>
> #define FILE "/home/niebie/sc/sdl/states.scm"
>
> void *some_func(void *arg){
>   SCM scm_c_primitive_load(FILE);

If the C string literal contains only ASCII characters, then it doesn't
matter either way, because all C locale encodings are ASCII-compatible.
Perhaps we should make that more clear in the documentation that you
quoted above.

A related question is whether we should change the API of
'scm_c_primitive_load' to expect a UTF-8 encoded file name instead of a
locale encoded one.

If the file name comes from a C string literal, then it will probably be
UTF-8 encoded, because that's what modern compilers tend to do.  On the
other hand, if the file name comes from somewhere else, e.g. from user
input, POSIX command line arguments, or environment variables, then it
should probably be the locale encoding.

I'm inclined to leave 'scm_c_primitive_load' as it is, because the
expected encoding is effectively part of the API.  Some programs might
depend on its current behavior, and file names are reasonably likely to
come from sources like environment variables or command-line arguments.
Furthermore, file names in C string literals are quite likely to be
ASCII-only anyway.

> I saw this only by reading the source for this function, as from the
> documentation it isn't obvious. I am sending this to bug-guile because I
> think that either this is a documentation bug or an implementation
> bug. In either case I am happy to send a patch that fixes whichever is
> at fault.
>
> A counter example is scm_c_public_variable in modules.c, which uses
> scm_from_utf8_symbol on the inputs, which precludes the use of dynamic c
> strings for the input. Again, not something documented. I would think
> both of these functions would do things the same way.

I'm not sure, because file names are reasonably likely to come from
external sources that are likely to be locale encoded, whereas Scheme
variable names are overwhelmingly likely to be C string literals.

In any case, these are longstanding APIs, so I don't think we should
change them.

So, I think the proper fixes here are to the documentation.  As you
suggested, the documentation for 'scm_c_public_variable',
'scm_c_primitive_load', and all other C functions in our API should
specify the encoding for C string arguments.

If you'd like to work on it, I'd be glad to accept documentation fixes
along these lines.

     Thanks!
       Mark





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-05-28 12:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 17:35 bug#31343: scm_c_primitive_load behavior/documentation bug Tom Balzer
2018-05-28 12:55 ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).