From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#31343: scm_c_primitive_load behavior/documentation bug Date: Mon, 28 May 2018 08:55:56 -0400 Message-ID: <87bmd0x7kz.fsf@netris.org> References: <877eom2c63.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1527512169 23691 195.159.176.226 (28 May 2018 12:56:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 28 May 2018 12:56:09 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) Cc: 31343@debbugs.gnu.org To: Tom Balzer Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Mon May 28 14:56:04 2018 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fNHgw-00062D-CQ for guile-bugs@m.gmane.org; Mon, 28 May 2018 14:56:02 +0200 Original-Received: from localhost ([::1]:56117 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fNHj3-0004C9-FR for guile-bugs@m.gmane.org; Mon, 28 May 2018 08:58:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47639) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fNHix-0004C1-05 for bug-guile@gnu.org; Mon, 28 May 2018 08:58:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fNHis-0008B5-5y for bug-guile@gnu.org; Mon, 28 May 2018 08:58:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43927) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fNHis-0008Am-2R for bug-guile@gnu.org; Mon, 28 May 2018 08:58:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fNHir-0006j0-Pb for bug-guile@gnu.org; Mon, 28 May 2018 08:58:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 28 May 2018 12:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31343 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 31343-submit@debbugs.gnu.org id=B31343.152751223825803 (code B ref 31343); Mon, 28 May 2018 12:58:01 +0000 Original-Received: (at 31343) by debbugs.gnu.org; 28 May 2018 12:57:18 +0000 Original-Received: from localhost ([127.0.0.1]:51824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fNHi8-0006i6-Nf for submit@debbugs.gnu.org; Mon, 28 May 2018 08:57:18 -0400 Original-Received: from world.peace.net ([64.112.178.59]:38488) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fNHi6-0006hs-Ge for 31343@debbugs.gnu.org; Mon, 28 May 2018 08:57:14 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fNHi0-0003a5-1e; Mon, 28 May 2018 08:57:08 -0400 In-Reply-To: <877eom2c63.fsf@gmail.com> (Tom Balzer's message of "Wed, 02 May 2018 12:35:54 -0500") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9051 Archived-At: Hi Tom, Sorry for taking so long to respond. Tom Balzer writes: > In ./guile/libguile/load.c, the function scm_c_primitive_load converts a > c string to a SCM string via scm_from_locale_string. I was reading the > manual and in section 6.6.5.14, it says: > >> C Function: SCM scm_from_locale_string (const char *str) >> C Function: SCM scm_from_locale_stringn (const char *str, size_t >> >> [...] >> >> Note that these functions should _not_ be used to convert C string >> constants, because there is no guarantee that the current locale >> will match that of the execution character set, used for string and >> character constants. Most modern C compilers use UTF-8 by default, >> so to convert C string constants we recommend >> =E2=80=98scm_from_utf8_string=E2=80=99. > > This implies to me that you should not use scm_c_primitive_load with any > constant, like this: > > #include > #include > > #define FILE "/home/niebie/sc/sdl/states.scm" > > void *some_func(void *arg){ > SCM scm_c_primitive_load(FILE); If the C string literal contains only ASCII characters, then it doesn't matter either way, because all C locale encodings are ASCII-compatible. Perhaps we should make that more clear in the documentation that you quoted above. A related question is whether we should change the API of 'scm_c_primitive_load' to expect a UTF-8 encoded file name instead of a locale encoded one. If the file name comes from a C string literal, then it will probably be UTF-8 encoded, because that's what modern compilers tend to do. On the other hand, if the file name comes from somewhere else, e.g. from user input, POSIX command line arguments, or environment variables, then it should probably be the locale encoding. I'm inclined to leave 'scm_c_primitive_load' as it is, because the expected encoding is effectively part of the API. Some programs might depend on its current behavior, and file names are reasonably likely to come from sources like environment variables or command-line arguments. Furthermore, file names in C string literals are quite likely to be ASCII-only anyway. > I saw this only by reading the source for this function, as from the > documentation it isn't obvious. I am sending this to bug-guile because I > think that either this is a documentation bug or an implementation > bug. In either case I am happy to send a patch that fixes whichever is > at fault. > > A counter example is scm_c_public_variable in modules.c, which uses > scm_from_utf8_symbol on the inputs, which precludes the use of dynamic c > strings for the input. Again, not something documented. I would think > both of these functions would do things the same way. I'm not sure, because file names are reasonably likely to come from external sources that are likely to be locale encoded, whereas Scheme variable names are overwhelmingly likely to be C string literals. In any case, these are longstanding APIs, so I don't think we should change them. So, I think the proper fixes here are to the documentation. As you suggested, the documentation for 'scm_c_public_variable', 'scm_c_primitive_load', and all other C functions in our API should specify the encoding for C string arguments. If you'd like to work on it, I'd be glad to accept documentation fixes along these lines. Thanks! Mark