From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: Wide string strategies Date: Thu, 09 Apr 2009 20:39:18 -0700 Message-ID: <1239334758.7191.104.camel@localhost.localdomain> References: <1239289212.5673.52.camel@localhost.localdomain> <87prflefqk.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1239334772 16404 80.91.229.12 (10 Apr 2009 03:39:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 10 Apr 2009 03:39:32 +0000 (UTC) Cc: guile-devel@gnu.org To: Ludovic =?ISO-8859-1?Q?Court=E8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Apr 10 05:40:52 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ls7bo-00017e-4J for guile-devel@m.gmane.org; Fri, 10 Apr 2009 05:40:52 +0200 Original-Received: from localhost ([127.0.0.1]:56620 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ls7aP-0008US-G6 for guile-devel@m.gmane.org; Thu, 09 Apr 2009 23:39:25 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ls7aJ-0008UF-8k for guile-devel@gnu.org; Thu, 09 Apr 2009 23:39:19 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ls7aH-0008Tp-Kx for guile-devel@gnu.org; Thu, 09 Apr 2009 23:39:17 -0400 Original-Received: from [199.232.76.173] (port=59978 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ls7aH-0008Tm-Fq for guile-devel@gnu.org; Thu, 09 Apr 2009 23:39:17 -0400 Original-Received: from smtp108.prem.mail.sp1.yahoo.com ([98.136.44.63]:44104) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1Ls7aG-0003su-TO for guile-devel@gnu.org; Thu, 09 Apr 2009 23:39:17 -0400 Original-Received: (qmail 68076 invoked from network); 10 Apr 2009 03:39:15 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-YMail-OSG:X-Yahoo-Newman-Property:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:Message-Id:Mime-Version:X-Mailer:Content-Transfer-Encoding; b=yL+kyzELPuriNJ6KokB6q49OjCqw8NUBBH/26uqDOw5oKlYkEwQIbJF+l6ADPm5V2hf8D3TnQ0ReQ/e8UrtUSfC9iwt/IpqeIXTgLLO1PmL/El68EOWKma/zCWENRiddIrwuDUj4sb2cy7RlO83QjEO/8NTZqgpi+CnMx0UOyuY= ; Original-Received: from unknown (HELO ?192.168.1.64?) (spk121@71.130.212.206 with plain) by smtp108.prem.mail.sp1.yahoo.com with SMTP; 10 Apr 2009 03:39:15 -0000 X-YMail-OSG: 0Jsn5ZsVM1laCSu6uKvMzTxmelV0FiL19azcStiowdiRfI1U5QuQXu5hME3v1MscJGw6_VcR1D5eNSV989bKVf.6xx6tiIElnDVIKhxonSBbV3_1iW_WlzZcWF2s3LCe0UH.KkSaFmaeV4N.DhjfvyKYcyu7QChMChGyxC5RMBO.5mpSdAK9xOikzNfYyWGhcxyQhDU5SUV_8aM_dVV2DaztUWCEm9_2Ooy7JIxwF_z1ML7Tuo2LIcwdaEf.KVUJparIBcxZ5geX2OQKfBU2uOeulA4OYWvDCGLW_zAVX77LSh1WvPg- X-Yahoo-Newman-Property: ymail-3 In-Reply-To: <87prflefqk.fsf@gnu.org> X-Mailer: Evolution 2.24.5 (2.24.5-1.fc10) X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.7-5.2 (or MacOS X 10.2-10.4) (2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8404 Archived-At: On Thu, 2009-04-09 at 22:25 +0200, Ludovic Courtès wrote: > Hi! > > - SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, > > - SCM_ARG1, scm_i_symbol_chars (SCM_SNAME (proc))); > > + { > > + char *str = scm_to_locale_string (scm_symbol_to_string (SCM_SNAME (proc))); > > + SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, SCM_ARG1, str); > > + free (str); > > + } > > This is the kind of thing we can't afford in most cases. > > Here STR is only needed because `SCM_WTA_DISPATCH_1 ()' calls > `scm_wrong_type_arg ()', which operates on C strings. > > One solution would be to change `scm_wrong_type_arg ()' to operate on > opaque strings (e.g., take an `SCM' instead of `const char *'). The > same applies to all the functions in "error.h", and probably many > others. > Makes sense. > I think procedures like `scm_i_string_ref_eq_char ()' are a good idea > because it fulfills the goal of having an opaque string type *and* the > goal of being able to handle them easily in C. I like it, too. > All the POSIX interface needs fast access to ASCII strings. How about > something like: > > const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout)); > > where `scm_i_ascii_symbol_chars ()' throws an exception if its argument > is a non-ASCII symbol? > > This would mean special-casing ASCII stringbufs so that we can treat > them as C strings. OK. Fast ASCII strings for the evaluator and for POSIX should be easy enough. Are there any other modules that definitely require fast strings? Also, the interaction between strings and sockets needs more thought. If sendto and recvfrom are used for datagram transmission, as it suggests in their docstrings, then locale string conversion could be a bad idea. (And, these functions should also operate on u8vectors, but that's another issue.) To be more general, I know some apps depend on 8-bit strings and use them as storage of non-string binary data. I think SND falls into this category. I wonder if ultimately wide strings would have to be a run-time option that is off by default. But I am (choose your English idiom here) getting ahead of myself, or jumping the gun, or putting the cart before the horse. > > +SCM_INTERNAL int scm_i_string_ref_eq_char (SCM str, size_t x, char c); > > +SCM_INTERNAL int scm_i_symbol_ref_eq_char (SCM str, size_t x, char c); > > +SCM_INTERNAL int scm_i_string_ref_neq_char (SCM str, size_t x, char c); > > +SCM_INTERNAL int scm_i_symbol_ref_neq_char (SCM str, size_t x, char c); > > I'd remove the `neq' variants. > Sure. > > +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c); > > Does it assume sizeof (int) >= 32 ? I suppose it does. But, I only used it to compare to the output of scm_getc which also returns an int. > > > +SCM_INTERNAL size_t scm_i_string_contains_char (SCM str, char ch); > > Since it really returns a boolean, I'd use `int' as the return type. Makes sense. > > > +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str); > > +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str); > > +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str); > > +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str); > > +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str); > > What does "sz" mean? Back in the day, "sz" was Microsoft-speak for the pointer to the first character of a null-terminated char string. By not knowing that, you have demonstrated that you remain unpolluted. ;-) I probably was trying to avoid writing "scm_i_string_to_string." > > > +/* For ASCII strings, SUB can be used to represent an invalid > > + character. */ > > +#define SCM_SUB ('\x1A') > > Why SUB? How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or > similar? If you're asking why SUB is set to 0x1A, the standard EMCA-48 says 0x1A should be used to indicate an invalid ASCII character. If you're asking why I just called it SCM_SUB, laziness. SCM_I_INVALID_ASCII_CHAR works for me. > > Thanks, > Ludo'. > > I'll try to rework this next week. -Mike