From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: scm_to_locale_stringbuf Date: Tue, 3 Feb 2009 15:46:52 -0800 (PST) Message-ID: <360535.25627.qm@web37903.mail.mud.yahoo.com> References: <493834.3827.qm@web37902.mail.mud.yahoo.com> <874ozbjfwt.fsf@arudy.ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1233704837 12050 80.91.229.12 (3 Feb 2009 23:47:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 3 Feb 2009 23:47:17 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Feb 04 00:48:30 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LUV0G-0005Bq-8V for guile-devel@m.gmane.org; Wed, 04 Feb 2009 00:48:28 +0100 Original-Received: from localhost ([127.0.0.1]:41350 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LUUyx-0001rp-Da for guile-devel@m.gmane.org; Tue, 03 Feb 2009 18:47:07 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LUUyq-0001qE-PR for guile-devel@gnu.org; Tue, 03 Feb 2009 18:47:00 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LUUyl-0001pL-3c for guile-devel@gnu.org; Tue, 03 Feb 2009 18:46:59 -0500 Original-Received: from [199.232.76.173] (port=34041 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LUUyk-0001pI-Tv for guile-devel@gnu.org; Tue, 03 Feb 2009 18:46:54 -0500 Original-Received: from web37903.mail.mud.yahoo.com ([209.191.91.165]:31779) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1LUUyk-0002OF-9w for guile-devel@gnu.org; Tue, 03 Feb 2009 18:46:54 -0500 Original-Received: (qmail 27339 invoked by uid 60001); 3 Feb 2009 23:46:52 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=GsHwm1Bd9W11cfIxnZQjaHsgpVnJ4WzwMKq3NWaSlPYDifhcUsjmDMrJlshSO92vMhCloPDxFkzCh4DWCKeziE0je9ig5kUxH7lVLZvRvhpK9WnexciMWq04OwMnOdYHk1x38+pybfyQhb/fNsUdhwU++7ipHXH4qi7AJzF6UOo=; X-YMail-OSG: 4rUte5kVM1kRuqk7JiqGiR9oaM4FRurWmbAZyhdcmYS6fF9JgeN7LTS_8EfiT8PVmf6x_RzJXsXHND5wDoPD7TAd_7weTNDNZNcOzrTZ7Vr_PfLcsuqsxV2de3bUADs9I.VfuEtUQgq2KeBJZ3rOvFHsESwlR0PEBWWspMYo0lpo8YYFrK932wpmY9pHbb3FT7t8JmJXOgsGvfFfezDC0i29r8Zr_A-- Original-Received: from [64.52.12.130] by web37903.mail.mud.yahoo.com via HTTP; Tue, 03 Feb 2009 15:46:52 PST X-Mailer: YahooMailRC/1156.82 YahooMailWebService/0.7.260.1 X-detected-operating-system: by monty-python.gnu.org: FreeBSD 6.x (1) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8129 Archived-At: > From: Neil Jerram neil@ossau.uklinux.net=0A=0A> I'm afraid I don't unders= tand the problem, on two counts.=0A> =0A> 1. The doc (in the manual) says t= hat scm_to_locale_stringbuf doesn't=0A> add a terminating \0.=A0 So presuma= bly any \0s present must be padding.=0A> =0A> 2. The doc also says that if = scm_to_locale_stringbuf's return value=0A> is > max_len (as it would be in = your case), the caller should call it=0A> again with a larger buffer.=0A> = =0A=0ARight now, the internal coding of strings is an unspecified 8-bit enc= oding, and is assumed to be compatible with the locale in which it is being= run.=0A=0ASo if I have a guile string with some 8-bit character that is be= tween 128 and 255, it just gets passed through.=A0 If I request the content= s of that string from C with scm_to_locale_string, it just returns the buff= er of the scheme string.=0A=0ABut,=A0in future,=A0scm_to_locale_string or s= cm_to_locale_stringbuf should actually do the proper conversion to the curr= ent locale so that wide characters are printed properly.=0A=0ASo, if we mov= e the internal representation of strings away=0Afrom unspecified 8-bit data= and toward something concrete,=0Alike ISO-8859-1 or UCS-4, and if a progra= m is running in an=0Aenvironment where a locale that has a multibyte encodi= ng=0Alike UTF-8,=A0then the created locale string could have multi-byte cha= racters.=0A=0AConsider a scheme string that is internally the single=0Achar= acter "LATIN SMALL LETTER A WITH ACUTE", which is=0AU+00E1.=A0 If the local= e were=A0some sort of UTF-8, like=0Aen_US.utf-8, this letter should become = the two bytes 0xC3=0Aand 0xA1 when converted to the locale.=0A=0ASo what sh= ould happen in this case if I call=0Ascm_to_locale_stringbuf (str, buf, 1)?= =A0 Note that here BUF=0Acan only contain 1 byte.=A0 Should the one byte 0x= C3 be=0Acopied into it, which creates an illegal string?=A0 Or,=0Ashould no= thing be copied into it.=A0 In either case, there=0Ashould be some mechanis= m in the API to provide information=0Athat an incomplete last character has= occurred, because=0Aoutputting just the one byte 0xC3 would cause problems= =0Asomewhere down the road.=0A=0ASo what I was saying was that in this case= maybe the best=0Athing to do would be to pad the output buffer with '\0'= =0Ainstead of putting in half of a multibyte character, and=0Athen signal t= hat there is some padding at the end of the=0Astring.=0A=0AFor instance, on= e could have a function=0Ascm_to_locale_stringbufn (SCM str, char *buf, siz= e_t max_len, size_t *len_used)=0Awhere LEN_USED is size of the buffer that = was actually=0Aused.=0A=0ASorry for the book-length explanation,=0A=0AMike = Gran