From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Marius Vollmer Newsgroups: gmane.lisp.guile.devel Subject: GH replacement proposal (includes a bit of Unicode) Date: Wed, 07 Apr 2004 15:00:06 +0200 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1081342909 12546 80.91.224.253 (7 Apr 2004 13:01:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 7 Apr 2004 13:01:49 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Apr 07 15:01:40 2004 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BBCgV-0006oK-00 for ; Wed, 07 Apr 2004 15:01:39 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.30) id 1BBCg5-0006Sp-4x for guile-devel@m.gmane.org; Wed, 07 Apr 2004 09:01:13 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.30) id 1BBCfc-0006P6-Jy for guile-devel@gnu.org; Wed, 07 Apr 2004 09:00:44 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.30) id 1BBCf4-0006HH-3o for guile-devel@gnu.org; Wed, 07 Apr 2004 09:00:41 -0400 Original-Received: from [129.217.163.1] (helo=mail.dt.e-technik.uni-dortmund.de) by monty-python.gnu.org with esmtp (Exim 4.30) id 1BBCf3-0006GN-1U for guile-devel@gnu.org; Wed, 07 Apr 2004 09:00:09 -0400 Original-Received: from troy.dt.e-technik.uni-dortmund.de (troy.dt.e-technik.uni-dortmund.de [129.217.163.17]) by mail.dt.e-technik.uni-dortmund.de (Postfix) with ESMTP id 6D5FA2723A for ; Wed, 7 Apr 2004 15:00:07 +0200 (CEST) Original-Received: by troy.dt.e-technik.uni-dortmund.de (Postfix, from userid 520) id EA3CCB980; Wed, 7 Apr 2004 15:00:06 +0200 (CEST) Original-To: guile-devel@gnu.org User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (gnu/linux) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:3588 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:3588 Hi, I have a partial proposal for making type conversions between Scheme and C easier. The first question is, do we need something different? Is the stuff below better enough to be worth the trouble of making everyone switch to it? I think it does improve upon the existing situation by being - thread safe (allowing true concurrency) - more consistent - allowing sophisticated internal data representation (for example for copy-on-write substrings, Unicode, etc.) - macro-free so that we can maintain binary compatibility easier Then there is the error handling: the functions below do not take "subr" or "pos" arguments to indicate where the error has happened. I don't think they are really needed and in any case only provides part of the backtrace. Opinions? ----- One important part of the Guile API is concerned with the conversion between Scheme values and C values. The functions that perform these conversions follow a common pattern. * Type predicates Type predicates for C code are named like this int scm_is_ (SCM val); They return 0 or 1. There are also the usual predicates that return a Scheme boolean, such as scm_string_p. * Conversion from C to Scheme For a C type , the function that converts it into a Scheme value looks like SCM scm_from_ ( val, ...); This function could be pronounced as "make Scheme from " to remember that the conversion is from to a Scheme object. No error will usually be signalled, except when not enough memory is available. Sometimes a function named SCM scm_take_ ( val, ...); is provided. ("let Scheme take ".) This function works like scm_from_ but the memory associated with VAL will be taken over by Guile. * Conversion from Scheme to C scm_to_ (SCM val, ...); ("convert Scheme to ".) When VAL is not representable as a or additional constraints are not satisfied, an error is signalled. * Concrete functions ** Booleans - SCM scm_is_bool (SCM val); - SCM scm_from_bool (int val); Return SCM_BOOL_T when val in non-zero, else return SCM_BOOL_F. - int scm_to_bool (SCM); - int scm_is_true (SCM); Return 0 when SCM is SCM_BOOL_F, else return 1. ** Integers - SCM scm_is_integer (SCM val); Determine whether VAL is an integer, exact or inexact. Note that the number 3.0 is an inexact integer although it is stored as a double. - SCM scm_from_signed_integer (scm_t_intmax val); - SCM scm_from_unsigned_integer (scm_t_uintmax val); Return the SCM value representing the integer . The SCM value will always be exact. - scm_t_intmax scm_to_signed_integer (SCM val, scm_t_intmax min, scm_t_intmax max); - scm_t_uintmax scm_to_unsigned_integer (SCM val, scm_t_uintmax max); Convert the SCM value VAL to a C integer when it is representable and when it is between min and max inclusive, or between 0 and max inclusive. Signal an error when it isn't. The SCM value can be exact or inexact, but it must be an integer. That is, scm_to_signed_integer (scm_from_double (3.0), -100, +100) yields the C integer 3 while scm_to_signed_integer (scm_from_double (3.5), -100, +100) is an error. - SCM scm_from_char (signed char); - SCM scm_from_short (short); - SCM scm_from_int (int val); - SCM scm_from_long (long val); - SCM scm_from_longlong (long long val); - SCM scm_from_ssize (ssize_t val); ... - SCM scm_from_uchar (unsigned char); - SCM scm_from_ushort (unsigned short); - SCM scm_from_uint (unsigned int val); - SCM scm_from_ulong (unsigned long val); - SCM scm_from_ulonglong (unsigned long long val); - SCM scm_from_size (size_t val); ... - signed char scm_to_char (SCM); - short scm_to_short (SCM); - int scm_to_int (SCM); - long scm_to_long (SCM); - long long scm_to_longlong (SCM); - ssize_t scm_to_ssize (SCM); ... - unsigned char scm_to_uchar (SCM); - unsigned short scm_to_ushort (SCM); - unsigned int scm_to_uint (SCM); - unsigned long scm_to_ulong (SCM); - unsigned long long scm_to_ulonglong (SCM); - size_t scm_to_size (SCM); ... Convert from/to the indicated integral types, signalling errors when the SCM value can not be represented. For integer types that are not provided for, you can use the general functions from above. For example, scm_from_short (x) is the same as scm_from_signed_integer ((short)(x)) and scm_to_short (x) is the same as ((short)(scm_to_signed_integer (x, SHORT_MIN, SHORT_MAX))) Thus, these functions are merely a convenience. Note that scm_to_char can not convert a Scheme character to a C char integer. See below. ** Floating point numbers We don't go to such a great length to cover all possible types here. "double" ought to be enough, no? - int scm_is_real (SCM val); Determine whether VAL is a real number, inexact or exact. Note that a number such as 1/3 or 0 is real, although it is not stored as a double. - SCM scm_from_double (double val); Return the SCM value corresponding to VAL. The SCM value will be 'inexact' as far as scm_inexact_p is considered but will be exactly equal to VAL. When you want to have an exact SCM value, use scm_inexact_to_exact (scm_from_double (val)) this will yield an exact fraction. - double scm_to_double (SCM); Convert VAL to the closest number representable as a double. Numbers that are too large or too small are converted into +Inf or -Inf, respectively. ** Complex numbers - int scm_is_complex (SCM val); Determine whether VAL is a complex number, inexact or exact. Note that a number such as 1/3 is complex, although it is not stored as two doubles. Complex numbers can be regarded as a compound type and need no dedicated conversion functions. For example, you can do scm_make_rectangular (scm_from_double (0.0), scm_from_double (1.0)) or double imag = scm_to_double (scm_imag_part (z)); but there are also convenience functions that are actually a bit more efficient: - SCM scm_from_complex_double (double re, double im); - double scm_to_real_part_double (SCM z); - double scm_to_imag_part_double (SCM z); But remember to use the generic functions scm_make_rectangular, scm_real_part, etc if you don't care whether the parts of a complex number are floating point numbers or not. For example, Guile might someday offer complex numbers where the real part is a fraction (currently it is always a double) and it is good to be prepared for this by not treating the parts of a complex as doubles when it is not needed. ** Characters A Scheme character in Guile is equivalent to a Unicode code point. - int scm_is_character (SCM val); - long scm_to_unicode (SCM ch); - SCM scm_from_unicode (long code); ** Strings Strings present the new problem that memory needs to be allocated or found for storing the result. Also, when new memory has been allocated, one needs to make sure that it isn't leaked in the case of non-local exits (like from errors in subsequent conversions). Such a cleanup action can be registered with scm_frame_unwind_handler, which see. - int scm_is_string (SCM val); - SCM scm_from_locale_string (unsigned char *str, ssize_t len); Return a new Scheme string initialized with STR, a string encoded according to the current locale. When LEN is -1, STR must be zero-terminated and its length is found that way. Otherwise LEN gives the length of STR. - SCM scm_from_utf8_string (unsigned char *str, ssize_t len); Same as above, but STR is encoded in UTF-8. Future versions of Guile will use UTF-8 internally and then this function will not need to perform any conversions at all. - SCM scm_take_utf8_string (unsigned char *str, ssize_t len); Same as above, but the memory for STR is taken over by Guile. It will eventually be freed using libc 'free'. - unsigned char *scm_to_locale_string (SCM str, size_t *lenp); Convert STR into a C string that is encoded as specified by the current locale. Memory is allocated for the C string that can be freed with 'free'. When the current locale can not encode STR, an error is signalled. When LENP is not NULL, the number of bytes contained in the returned string is stored in *LENP. The string is zero-terminated, but it might contain zero characters in the middle. When LENP is NULL and the string does indeed contain a zero character, it is not encodable and an error is signalled. - unsigned char *scm_to_utf8_string (SCM str, size_t *lenp); Same as above but returns a UTF-8 encoded string. This will always work when LENP is non-NULL. [ More encodings can be specified later, for example by just referring to the character sets supported by 'iconv'. The above two, locale and utf8, are needed for transitioning Guile to Unicode. Right now, strings are in the locale encoding but in the future they will be in UTF-8. ] The above functions always return newly allocated memory. When that is deemed too expensive, the following functions can be used instead. However, care must be taken to use them correctly and reasonably. - scm_lock_heap (); - scm_unlock_heap (); These two functions lock and unlock all SCM objects (the heap). The heap should not be locked for long periods of time and no calls to 'normal' libguile functions are allowed while it is locked. A function is 'normal' unless it is specifically documented to be useable with a locked heap. (Indeed, most 'unnormal' functions can _only_ be used while the heap is locked.) You can not lock the heap twice. Calling scm_lock_heap while the heap is already locked results in undefined behavior. Likewise, calling scm_unlock_heap when the heap is not locked is verboten. - const unsigned char *scm_l_get_utf8_string_mem (SCM str); Return a pointer to the internal UTF-8 bytes of STR. This function can only be called while the heap is locked and the returned pointer becomes invalid when the heap is unlocked later on. The string is _not_ guaranteed to be zero-terminated, you _must_ use scm_l_get_utf8_string_len (see below). You are not allowed to modify the string contents. (The "scm_l_" prefix denotes a function that must be called with a locked heap.) - size_t scm_l_get_utf8_string_len (SCM str); Return the length in bytes of STR. Heap must be locked. ** Symbols Symbols have strings as their names and you can get that name via scm_symbol_to_string. However, it is more efficient to convert to/from a symbol directly. - int scm_is_symbol (SCM val); - SCM scm_from_locale_symbol (unsigned char *str, ssize_t len); - SCM scm_from_utf8_symbol (unsigned char *str, ssize_t len); - SCM scm_take_utf8_symbol (unsigned char *str, ssize_t len); - unsigned char *scm_to_locale_symbol (SCM str, size_t *lenp); - unsigned char *scm_to_utf8_symbol (SCM str, size_t *lenp); - const unsigned char *scm_l_get_utf8_symbol_mem (SCM str); - size_t scm_l_get_utf8_symbol_len (SCM str); ** Uniform vectors [ Uniform vectors should get the same kind of support as strings, but without the encoding business of course. ] - int scm_is_u8vector (SCM val); - SCM scm_from_u8vector (unsigned char *vec, size_t len); - SCM scm_take_u8vector (unsigned char *vec, size_t len); - unsigned char *scm_to_u8vector (SCM vec, sizte_t *lenp); - unsigned char *scm_l_get_u8vector_mem (SCM vec); - size_t scm_l_get_u8vector_len (SCM vec); etc. ** Compound types - int scm_is_pair (SCM val); - SCM scm_car (SCM pair); - SCM scm_cdr (SCM pair); - int scm_is_list (SCM val); - SCM scm_c_list_ref (SCM list, int idx); - SCM scm_c_list_set (SCM list, int idx, SCM val); - int scm_c_list_length (SCM list); - int scm_is_vector (SCM val); - SCM scm_c_vector_ref (SCM vec, int idx); - SCM scm_c_vector_set (SCM vec, int idx, SCM val); - int scm_c_vector_length (SCM vec); Additional types can be handled with code like if (scm_from_bool (scm_procedure_p (val))) ... _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel