From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: Wide string strategies Date: Fri, 10 Apr 2009 10:14:00 -0700 (PDT) Message-ID: <316910.55438.qm@web37903.mail.mud.yahoo.com> References: <1239289212.5673.52.camel@localhost.localdomain> <87prflefqk.fsf@gnu.org> <1239334758.7191.104.camel@localhost.localdomain> <87r601aql5.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1239383674 7884 80.91.229.12 (10 Apr 2009 17:14:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 10 Apr 2009 17:14:34 +0000 (UTC) Cc: guile-devel@gnu.org To: =?iso-8859-1?Q?Ludovic_Court=E8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Apr 10 19:15:52 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LsKKM-0001KP-Lc for guile-devel@m.gmane.org; Fri, 10 Apr 2009 19:15:43 +0200 Original-Received: from localhost ([127.0.0.1]:47789 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LsKIy-0006qs-30 for guile-devel@m.gmane.org; Fri, 10 Apr 2009 13:14:16 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LsKIr-0006qn-7W for guile-devel@gnu.org; Fri, 10 Apr 2009 13:14:09 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LsKIl-0006qH-Ex for guile-devel@gnu.org; Fri, 10 Apr 2009 13:14:07 -0400 Original-Received: from [199.232.76.173] (port=40527 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LsKIl-0006qE-9s for guile-devel@gnu.org; Fri, 10 Apr 2009 13:14:03 -0400 Original-Received: from web37903.mail.mud.yahoo.com ([209.191.91.165]:31630) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1LsKIl-0005cP-04 for guile-devel@gnu.org; Fri, 10 Apr 2009 13:14:03 -0400 Original-Received: (qmail 55670 invoked by uid 60001); 10 Apr 2009 17:14:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239383641; bh=l7DLB9QCJtDabOkzv5Ca1IUfQWsyIPjIb+Ima2x3X/4=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=inOwKLMf6vs6vNmzY2t5pxxuS4tob8/z0IBxN79KKIil44D/12AVAMOy7L0jWbE6cY4qdmnCn/UaZ28yAo99uUU5ZRQfUjXY0UaBcfLqPk7VhmBV9KAWo+I2eWxWoExh5HzzBMUbj5a5PK3K5s3K5M1oXC9MvvPRq13g1O5370k= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=TbGh1twwsbzXcvgwtBD5mYwWjP0mcsK+npv1YpEo/u3C8coXv9PyXOn/ClGjxOwL0qpOxyjLs4PvUTB2S4rOeHKT3cOw6c1Ac3fVjBSjc8qL9XV09/RYuJnDKBALVxl5HDKiU3Uwn4Zw90bq+S43ruZL0KZbUD8y1s/6Y50iq4c=; X-YMail-OSG: zvLGxh8VM1km4m7UfkGlVAqCKuZfT_kRTq2LVWayeVKJib80V39xcMS57qnsufXwSIY0LDo4Y6O81nqLI_yyVwnfdoNHJaNiq8g1cSQlWP0zwsaAYIrY3OpsSMIeaIPcFn65s_WT5gs8tQmouXu4jTi9HtmYIXILlCXmuGwMthfoncjeComOO0zqhkJUuy9b4ul5ZLy9MCOGMN5fFpYxTmAC01VYr_.NOWomU7bq_YYZKFy0MEvujsoEsLwsE30fW9xa5MdmiA2qyb67MDY7VkiAcXM4hj3Zj_3KDqSc.Pmn8Uo8EV_m_o299X0HVJ7a3K8- Original-Received: from [64.52.12.130] by web37903.mail.mud.yahoo.com via HTTP; Fri, 10 Apr 2009 10:14:00 PDT X-Mailer: YahooMailRC/1277.35 YahooMailWebService/0.7.289.1 In-Reply-To: <87r601aql5.fsf@gnu.org> X-detected-operating-system: by monty-python.gnu.org: FreeBSD 6.x (1) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8406 Archived-At: > From: Ludovic Court=E8s =0A> Mike Gran writes:=0A> > On Thu= , 2009-04-09 at 22:25 +0200, Ludovic Court=E8s wrote: =0A=0A> Actually, for= the file system interface, for instance, it's even=0A> trickier: the encod= ing of file names usually isn't specified, but some=0A> apps/libraries have= their opinion on that, e.g., Glib=0A> (http://library.gnome.org/devel/glib= /unstable/glib-File-Utilities.html).=0A> We should probably follow their le= ad here, but that's a secondary=0A> problem anyway.=0A=0ATrue.=A0 The one r= eal standard that I do know is that NTFS requires UTF-8 =0Afilenames. =0A= =0A> =0A> > Also, the interaction between strings and sockets needs more th= ought.=0A> > If sendto and recvfrom are used for datagram transmission, as = it=0A> > suggests in their docstrings, then locale string conversion could = be a=0A> > bad idea.=A0 (And, these functions should also operate on u8vect= ors, but=0A> > that's another issue.)=0A> =0A> Agreed.=0A> =0A> > To be mor= e general, I know some apps depend on 8-bit strings and use=0A> > them as s= torage of non-string binary data.=0A> =0A> Yes, notably because of `sendto'= et al. that take a string.=0A> =0A> > I think SND falls into this=0A> > ca= tegory.=A0 I wonder if ultimately wide strings would have to be a=0A> > run= -time option that is off by default.=A0 But I am (choose your English=0A> >= idiom here) getting ahead of myself, or jumping the gun, or putting the=0A= > > cart before the horse.=0A> =0A> I don't have any idea of how we could u= sefully handle that.=0A> =0A> Eventually, it may be a good idea to deprecat= e `(sento "foobar")' in=0A> favor of a variant that takes a bytevector or s= ome such.=0A=0AMaybe its best to leave them unchanged w.r.t strings.=A0 Any= char values between=0A128 and 255 would just be=A0interpreted as if they w= ere UCS-4 characters=0A128 to 255 and get put in the strings directly.=0A= =0AIn the short term,=A0socket functions=A0could also be modified=0Ato take= =A0both strings=A0and u8vectors.=A0 Then, if someone was actually =0Apushin= g UTF strings over the network, they could use =0A"utf8-encoded-u8vector->s= tring" or some such to do the conversion.=0A=0AAnd, in the long run, socket= s can become a type of port, and those=0Aports can have attached transcodin= g.=0A=0A> =0A> >> > +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, siz= e_t x, int c);=0A> >> =0A> >> Does it assume sizeof (int) >=3D 32 ?=0A> >= =0A> > I suppose it does.=A0 But, I only used it to compare to the output o= f=0A> > scm_getc which also returns an int.=0A> =0A> I meant, is the intent= that C contains a codepoint?=0A=0AYes.=A0 And when wide strings are implem= ented, the gnulib convention is=0Athat a wide character is represented in C= as uint32.=0A=0A> =0A> >> > +SCM_INTERNAL char *scm_i_string_to_write_sz (= SCM str);=0A> >> > +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str= );=0A> >> > +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *st= r);=0A> >> > +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (S= CM str);=0A> >> > +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_= sz (SCM str);=0A=0A> How about:=0A> =0A> =A0 SCM scm_i_from_ascii_string (c= onst scm_t_uint8 *str);=0A> =0A> and similar?=0A=0AOK.=0A=0A> >> =0A> >> > = +/* For ASCII strings, SUB can be used to represent an invalid=0A> >> > += =A0 character.=A0 */=0A> >> > +#define SCM_SUB ('\x1A')=0A> >> =0A> >> Why = SUB?=A0 How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or=0A> >> si= milar?=0A> >=0A> > If you're asking why SUB is set to 0x1A, the standard EM= CA-48 says 0x1A=0A> > should be used to indicate an invalid ASCII character= .=0A> =0A> I suspected that.=A0 Then `SCM_I_SUB_CHAR' may be a good name, p= erhaps=0A> with a comment saying that this is the "official SUB character".= =0A> =0A=0AOK.=0A=0AThanks,=0A=0AMike