From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: Wide string strategies Date: Thu, 09 Apr 2009 22:25:55 +0200 Message-ID: <87prflefqk.fsf@gnu.org> References: <1239289212.5673.52.camel@localhost.localdomain> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1239308804 23911 80.91.229.12 (9 Apr 2009 20:26:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 9 Apr 2009 20:26:44 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Apr 09 22:28:03 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ls0qv-0001bX-H3 for guile-devel@m.gmane.org; Thu, 09 Apr 2009 22:28:01 +0200 Original-Received: from localhost ([127.0.0.1]:37998 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ls0pX-00079W-2G for guile-devel@m.gmane.org; Thu, 09 Apr 2009 16:26:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ls0pD-00071X-Fl for guile-devel@gnu.org; Thu, 09 Apr 2009 16:26:15 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ls0p8-0006z3-QM for guile-devel@gnu.org; Thu, 09 Apr 2009 16:26:14 -0400 Original-Received: from [199.232.76.173] (port=53418 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ls0p8-0006yv-CS for guile-devel@gnu.org; Thu, 09 Apr 2009 16:26:10 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:60334 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Ls0p7-0001LA-Rr for guile-devel@gnu.org; Thu, 09 Apr 2009 16:26:10 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Ls0p4-0002KU-JN for guile-devel@gnu.org; Thu, 09 Apr 2009 20:26:06 +0000 Original-Received: from reverse-83.fdn.fr ([80.67.176.83]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 09 Apr 2009 20:26:06 +0000 Original-Received: from ludo by reverse-83.fdn.fr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 09 Apr 2009 20:26:06 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 111 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: reverse-83.fdn.fr X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 20 Germinal an 217 de la =?iso-8859-1?Q?R=E9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: i686-pc-linux-gnu User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (gnu/linux) Cancel-Lock: sha1:UgXz3gTNtCNa1jdWnMfphJha2oA= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8403 Archived-At: Hi! Thank you for working on this! Mike Gran writes: > For now, I think a good strategy is to make strings into a pseudo-class > where the internals are opaque to most of Guile and strings are accessed > through accessors and other methods. Sounds reasonable. > This was the strategy already begun with scm_to_locale_string but the > code isn't fully committed to the idea. The function scm_i_string_chars > exposes the internal representation of the string, and it is used > throughout the code. Yes, that's going to be difficult, but I can't think of a better solution. > - SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, > - SCM_ARG1, scm_i_symbol_chars (SCM_SNAME (proc))); > + { > + char *str = scm_to_locale_string (scm_symbol_to_string (SCM_SNAME (proc))); > + SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, SCM_ARG1, str); > + free (str); > + } This is the kind of thing we can't afford in most cases. Here STR is only needed because `SCM_WTA_DISPATCH_1 ()' calls `scm_wrong_type_arg ()', which operates on C strings. One solution would be to change `scm_wrong_type_arg ()' to operate on opaque strings (e.g., take an `SCM' instead of `const char *'). The same applies to all the functions in "error.h", and probably many others. > - if (len > 0 && (s[0] == '/' || s[0] == '\\')) > + if (len > 0 && (scm_i_string_ref_eq_char (filename, 0, '/') > + || scm_i_string_ref_eq_char (filename, 0, '\\'))) I think procedures like `scm_i_string_ref_eq_char ()' are a good idea because it fulfills the goal of having an opaque string type *and* the goal of being able to handle them easily in C. > - scm_ungets (scm_i_string_chars (str), scm_i_string_length (str), port); > + buf = scm_to_locale_stringn (str, &len); > + scm_ungets (buf, len, port); > + free (buf); Eventually, we might need to change `scm_ungets ()', or provide a variant that takes an opaque string. > @@ -1549,12 +1552,16 @@ SCM_DEFINE (scm_recvfrom, "recvfrom!", 2, 3, 0, > > /* recvfrom will not necessarily return an address. usually nothing > is returned for stream sockets. */ > - buf = scm_i_string_writable_chars (str); > + buf = scm_malloc (cend - offset); All the POSIX interface needs fast access to ASCII strings. How about something like: const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout)); where `scm_i_ascii_symbol_chars ()' throws an exception if its argument is a non-ASCII symbol? This would mean special-casing ASCII stringbufs so that we can treat them as C strings. > +static const char * > +make_failsafe_ascii_sz (const char *str, size_t len) > +{ > + static char buf[SCM_FAILSAFE_STRING_LEN]; Ouch, that would be bug-prone. > +SCM_INTERNAL int scm_i_string_ref_eq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_symbol_ref_eq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_string_ref_neq_char (SCM str, size_t x, char c); > +SCM_INTERNAL int scm_i_symbol_ref_neq_char (SCM str, size_t x, char c); I'd remove the `neq' variants. > +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c); Does it assume sizeof (int) >= 32 ? > +SCM_INTERNAL size_t scm_i_string_contains_char (SCM str, char ch); Since it really returns a boolean, I'd use `int' as the return type. > +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str); > +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str); > +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str); > +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str); > +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str); What does "sz" mean? > +/* For ASCII strings, SUB can be used to represent an invalid > + character. */ > +#define SCM_SUB ('\x1A') Why SUB? How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or similar? Thanks, Ludo'.