From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: Unicode strings and symbols Date: Mon, 10 Aug 2009 23:27:48 +0200 Message-ID: <87prb3fjfv.fsf@gnu.org> References: <87skg0sv8h.fsf@gnu.org> <1249857644.30365.53.camel@localhost.localdomain> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1249939702 29992 80.91.229.12 (10 Aug 2009 21:28:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 10 Aug 2009 21:28:22 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Aug 10 23:28:15 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MacPe-0002rS-Kw for guile-devel@m.gmane.org; Mon, 10 Aug 2009 23:28:14 +0200 Original-Received: from localhost ([127.0.0.1]:51838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MacPd-0005of-KS for guile-devel@m.gmane.org; Mon, 10 Aug 2009 17:28:13 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MacPX-0005nx-6g for guile-devel@gnu.org; Mon, 10 Aug 2009 17:28:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MacPS-0005je-Bt for guile-devel@gnu.org; Mon, 10 Aug 2009 17:28:06 -0400 Original-Received: from [199.232.76.173] (port=38400 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MacPS-0005jO-4j for guile-devel@gnu.org; Mon, 10 Aug 2009 17:28:02 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:43136 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MacPR-00009j-Bx for guile-devel@gnu.org; Mon, 10 Aug 2009 17:28:01 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1MacPO-0005RN-1T for guile-devel@gnu.org; Mon, 10 Aug 2009 21:27:58 +0000 Original-Received: from reverse-83.fdn.fr ([80.67.176.83]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 10 Aug 2009 21:27:58 +0000 Original-Received: from ludo by reverse-83.fdn.fr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 10 Aug 2009 21:27:58 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 78 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: reverse-83.fdn.fr X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 23 Thermidor an 217 de la =?iso-8859-1?Q?R=E9volutio?= =?iso-8859-1?Q?n?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: x86_64-unknown-linux-gnu User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:WqzpTNdPcw6ZZHwuNkejHlkA2Dg= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9052 Archived-At: Hey, Mike Gran writes: [...] >> > +SCM_API void scm_charprint (scm_t_uint32 c, SCM port); >> >> This ought to be internal, no? > > Could be. A couple of the types are given their own print functions: > scm_intprint and an scm_uintprint. Most types don't have their own > print functions. Are int and uint given special treatment because of > their radix term? Dunno. Anyway, they're not really meant to be public either. Feel free to make them internal as well, while you're at it. ;-) >> > + (scm_t_wchar) (unsigned char) STRINGBUF_INLINE_CHARS (buf)[i]; >> >> Is the double cast needed? > > Sort of. Unsigned char will successfully be implicitly cast to > scm_t_wchar, so the scm_t_wchar term is just for clarity. The unsigned > char term is definitely needed. Negative 8-bit chars are the upper half > of the 8-bit charset (128 - 255). Casting them directly to scm_t_wchar > may return 0xFFFFFF80 - 0xFFFFFFFF instead of 128-255. I don't have any > problem removing the scm_t_wchar cast. Would you prefer that? How about: #define STRINGBUF_INLINE_CHARS(buf) \ ((unsigned char *) SCM_CELL_OBJECT_LOC ((buf), 1)) and changing the caller to: for (i = 0; i < len; i++) mem[i] = (scm_t_wchar) STRINGBUF_INLINE_CHARS (buf)[i]; ? That would make the intent clearer to me. > I put it in because that information needs to be available in the > bytecode compiler. A slightly clearer name would probably be > string-bytes-per-character, I suppose. Agreed, let's take this name. >> > +SCM_INTERNAL char *scm_to_stringn (SCM str, size_t *lenp, >> > + const char *encoding, >> > + enum iconv_ilseq_handler handler); >> >> I suppose this would eventually become public. What do you think? >> Should we use a different type for HANDLER before that happens? > > The simplest thing would be to make some constants like > > scm_c_define ("STRING_ESCAPE", scm_from_int(iconveh_escape_sequence)) > > Something similar is done in the scm_seek function's constants, such as > SEEK_CUR. It's a C API so Scheme-level constants don't matter. I was wondering whether using `enum iconv_ilseq_handler' in the public API would be a good idea because that means that public headers include either the system's or GNU libiconv's (or some libunistring header), in which case `guile.pc' must include the right `-I' flag, etc. This may slightly complicate compilation of Guile apps. Another downside is that Guile's API would be bound to the values and semantics of `iconv_ilseq_handler', and bound to iconv. One possibility to avoid th would be to define our own type similar to `iconv_ilseq_handler'. Thanks, Ludo'.