GH replacement proposal (includes a bit of Unicode)

* GH replacement proposal (includes a bit of Unicode)
@ 2004-04-07 13:00 Marius Vollmer
  2004-04-07 15:04 ` Paul Jarc
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Marius Vollmer @ 2004-04-07 13:00 UTC (permalink / raw)

Hi,

I have a partial proposal for making type conversions between Scheme
and C easier.

The first question is, do we need something different?  Is the stuff
below better enough to be worth the trouble of making everyone switch
to it?

I think it does improve upon the existing situation by being

  - thread safe (allowing true concurrency)
  - more consistent
  - allowing sophisticated internal data representation (for example
    for copy-on-write substrings, Unicode, etc.)
  - macro-free so that we can maintain binary compatibility easier

Then there is the error handling: the functions below do not take
"subr" or "pos" arguments to indicate where the error has happened.  I
don't think they are really needed and in any case only provides part
of the backtrace.

Opinions?

-----

One important part of the Guile API is concerned with the conversion
between Scheme values and C values.  The functions that perform these
conversions follow a common pattern.

* Type predicates

Type predicates for C code are named like this

    int scm_is_<type> (SCM val);

They return 0 or 1. 

There are also the usual predicates that return a Scheme boolean, such
as scm_string_p.

* Conversion from C to Scheme

For a C type <type>, the function that converts it into a Scheme value
looks like

    SCM scm_from_<type> (<type> val, ...);

This function could be pronounced as "make Scheme from <type>" to
remember that the conversion is from <type> to a Scheme object.
No error will usually be signalled, except when not enough memory is
available.

Sometimes a function named

    SCM scm_take_<type> (<type> val, ...);

is provided.  ("let Scheme take <type>".)  This function works like
scm_from_<type> but the memory associated with VAL will be taken over
by Guile.

* Conversion from Scheme to C

    <type> scm_to_<type> (SCM val, ...);

("convert Scheme to <type>".)  When VAL is not representable as a
<type> or additional constraints are not satisfied, an error is
signalled.

* Concrete functions

** Booleans

  - SCM scm_is_bool (SCM val);

  - SCM scm_from_bool (int val);

    Return SCM_BOOL_T when val in non-zero, else return SCM_BOOL_F.

  - int scm_to_bool (SCM);
  - int scm_is_true (SCM);

    Return 0 when SCM is SCM_BOOL_F, else return 1.

** Integers

  - SCM scm_is_integer (SCM val);

  Determine whether VAL is an integer, exact or inexact.  Note that
  the number 3.0 is an inexact integer although it is stored as a
  double.

  - SCM scm_from_signed_integer (scm_t_intmax val);
  - SCM scm_from_unsigned_integer (scm_t_uintmax val);

    Return the SCM value representing the integer <val>.  The SCM
    value will always be exact.

  - scm_t_intmax scm_to_signed_integer (SCM val,
                                        scm_t_intmax min, scm_t_intmax max);
  - scm_t_uintmax scm_to_unsigned_integer (SCM val, scm_t_uintmax max);

    Convert the SCM value VAL to a C integer when it is representable
    and when it is between min and max inclusive, or between 0 and max
    inclusive.  Signal an error when it isn't.  The SCM value can be
    exact or inexact, but it must be an integer.  That is,

       scm_to_signed_integer (scm_from_double (3.0), -100, +100)

    yields the C integer 3 while

       scm_to_signed_integer (scm_from_double (3.5), -100, +100)

    is an error.

  - SCM scm_from_char (signed char);
  - SCM scm_from_short (short);
  - SCM scm_from_int (int val);
  - SCM scm_from_long (long val);
  - SCM scm_from_longlong (long long val);
  - SCM scm_from_ssize (ssize_t val);
    ...

  - SCM scm_from_uchar (unsigned char);
  - SCM scm_from_ushort (unsigned short);
  - SCM scm_from_uint (unsigned int val);
  - SCM scm_from_ulong (unsigned long val);
  - SCM scm_from_ulonglong (unsigned long long val);
  - SCM scm_from_size (size_t val);
    ...

  - signed char        scm_to_char (SCM);
  - short              scm_to_short (SCM);
  - int                scm_to_int (SCM);
  - long               scm_to_long (SCM);
  - long long          scm_to_longlong (SCM);
  - ssize_t            scm_to_ssize (SCM);
    ...

  - unsigned char      scm_to_uchar (SCM);
  - unsigned short     scm_to_ushort (SCM);
  - unsigned int       scm_to_uint (SCM);
  - unsigned long      scm_to_ulong (SCM);
  - unsigned long long scm_to_ulonglong (SCM);
  - size_t             scm_to_size (SCM);
    ...

    Convert from/to the indicated integral types, signalling errors
    when the SCM value can not be represented.  For integer types that
    are not provided for, you can use the general functions from
    above.  For example, scm_from_short (x) is the same as

        scm_from_signed_integer ((short)(x))

    and scm_to_short (x) is the same as

        ((short)(scm_to_signed_integer (x, SHORT_MIN, SHORT_MAX)))

    Thus, these functions are merely a convenience.

    Note that scm_to_char can not convert a Scheme character to a C
    char integer.  See below.

** Floating point numbers

We don't go to such a great length to cover all possible types
here. "double" ought to be enough, no?

  - int scm_is_real (SCM val);

  Determine whether VAL is a real number, inexact or exact.  Note that
  a number such as 1/3 or 0 is real, although it is not stored as a
  double.

  - SCM scm_from_double (double val);

    Return the SCM value corresponding to VAL.  The SCM value will be
    'inexact' as far as scm_inexact_p is considered but will be
    exactly equal to VAL.  When you want to have an exact SCM value,
    use

        scm_inexact_to_exact (scm_from_double (val))

    this will yield an exact fraction.

  - double scm_to_double (SCM);

    Convert VAL to the closest number representable as a double.
    Numbers that are too large or too small are converted into +Inf or
    -Inf, respectively.

** Complex numbers

  - int scm_is_complex (SCM val);

  Determine whether VAL is a complex number, inexact or exact.  Note
  that a number such as 1/3 is complex, although it is not stored as
  two doubles.

Complex numbers can be regarded as a compound type and need no
dedicated conversion functions.  For example, you can do

    scm_make_rectangular (scm_from_double (0.0), scm_from_double (1.0))

or

    double imag = scm_to_double (scm_imag_part (z));

but there are also convenience functions that are actually a bit more
efficient:

  - SCM scm_from_complex_double (double re, double im);
  - double scm_to_real_part_double (SCM z);
  - double scm_to_imag_part_double (SCM z);

But remember to use the generic functions scm_make_rectangular,
scm_real_part, etc if you don't care whether the parts of a complex
number are floating point numbers or not.  For example, Guile might
someday offer complex numbers where the real part is a fraction
(currently it is always a double) and it is good to be prepared for
this by not treating the parts of a complex as doubles when it is not
needed.

** Characters

A Scheme character in Guile is equivalent to a Unicode code point.

  - int scm_is_character (SCM val);
  - long scm_to_unicode (SCM ch);
  - SCM  scm_from_unicode (long code);

** Strings

Strings present the new problem that memory needs to be allocated or
found for storing the result.  Also, when new memory has been
allocated, one needs to make sure that it isn't leaked in the case of
non-local exits (like from errors in subsequent conversions).  Such a
cleanup action can be registered with scm_frame_unwind_handler, which
see.

  - int scm_is_string (SCM val);

  - SCM scm_from_locale_string (unsigned char *str, ssize_t len);

  Return a new Scheme string initialized with STR, a string encoded
  according to the current locale.  When LEN is -1, STR must be
  zero-terminated and its length is found that way.  Otherwise LEN
  gives the length of STR.

  - SCM scm_from_utf8_string (unsigned char *str, ssize_t len);

  Same as above, but STR is encoded in UTF-8.  Future versions of
  Guile will use UTF-8 internally and then this function will not need
  to perform any conversions at all.

  - SCM scm_take_utf8_string (unsigned char *str, ssize_t len);

  Same as above, but the memory for STR is taken over by Guile.  It
  will eventually be freed using libc 'free'.

  - unsigned char *scm_to_locale_string (SCM str, size_t *lenp);

  Convert STR into a C string that is encoded as specified by the
  current locale.  Memory is allocated for the C string that can be
  freed with 'free'.

  When the current locale can not encode STR, an error is signalled.

  When LENP is not NULL, the number of bytes contained in the returned
  string is stored in *LENP.  The string is zero-terminated, but it
  might contain zero characters in the middle.

  When LENP is NULL and the string does indeed contain a zero
  character, it is not encodable and an error is signalled.

  - unsigned char *scm_to_utf8_string (SCM str, size_t *lenp);

  Same as above but returns a UTF-8 encoded string.  This will always
  work when LENP is non-NULL.

  [ More encodings can be specified later, for example by just
    referring to the character sets supported by 'iconv'.  The above
    two, locale and utf8, are needed for transitioning Guile to
    Unicode.  Right now, strings are in the locale encoding but in the
    future they will be in UTF-8. ]

The above functions always return newly allocated memory.  When that
is deemed too expensive, the following functions can be used instead.
However, care must be taken to use them correctly and reasonably.

  - scm_lock_heap ();
  - scm_unlock_heap ();

  These two functions lock and unlock all SCM objects (the heap).  The
  heap should not be locked for long periods of time and no calls to
  'normal' libguile functions are allowed while it is locked.  A
  function is 'normal' unless it is specifically documented to be
  useable with a locked heap.  (Indeed, most 'unnormal' functions can
  _only_ be used while the heap is locked.)

  You can not lock the heap twice.  Calling scm_lock_heap while the
  heap is already locked results in undefined behavior.  Likewise,
  calling scm_unlock_heap when the heap is not locked is verboten.

  - const unsigned char *scm_l_get_utf8_string_mem (SCM str);

  Return a pointer to the internal UTF-8 bytes of STR.  This function
  can only be called while the heap is locked and the returned pointer
  becomes invalid when the heap is unlocked later on.  The string is
  _not_ guaranteed to be zero-terminated, you _must_ use
  scm_l_get_utf8_string_len (see below).

  You are not allowed to modify the string contents.

  (The "scm_l_" prefix denotes a function that must be called with a
  locked heap.)

  - size_t scm_l_get_utf8_string_len (SCM str);

  Return the length in bytes of STR.  Heap must be locked.

** Symbols

Symbols have strings as their names and you can get that name via
scm_symbol_to_string.  However, it is more efficient to convert
to/from a symbol directly.

  - int scm_is_symbol (SCM val);
  - SCM scm_from_locale_symbol (unsigned char *str, ssize_t len);
  - SCM scm_from_utf8_symbol (unsigned char *str, ssize_t len);
  - SCM scm_take_utf8_symbol (unsigned char *str, ssize_t len);
  - unsigned char *scm_to_locale_symbol (SCM str, size_t *lenp);
  - unsigned char *scm_to_utf8_symbol (SCM str, size_t *lenp);
  - const unsigned char *scm_l_get_utf8_symbol_mem (SCM str);
  - size_t scm_l_get_utf8_symbol_len (SCM str);

** Uniform vectors

[ Uniform vectors should get the same kind of support as strings, but
without the encoding business of course. ]

  - int scm_is_u8vector (SCM val);
  - SCM scm_from_u8vector (unsigned char *vec, size_t len);
  - SCM scm_take_u8vector (unsigned char *vec, size_t len);
  - unsigned char *scm_to_u8vector (SCM vec, sizte_t *lenp);
  - unsigned char *scm_l_get_u8vector_mem (SCM vec);
  - size_t scm_l_get_u8vector_len (SCM vec);

  etc.

** Compound types

  - int scm_is_pair (SCM val);
  - SCM scm_car (SCM pair);
  - SCM scm_cdr (SCM pair);

  - int scm_is_list (SCM val);
  - SCM scm_c_list_ref (SCM list, int idx);
  - SCM scm_c_list_set (SCM list, int idx, SCM val);
  - int scm_c_list_length (SCM list);

  - int scm_is_vector (SCM val);
  - SCM scm_c_vector_ref (SCM vec, int idx);
  - SCM scm_c_vector_set (SCM vec, int idx, SCM val);
  - int scm_c_vector_length (SCM vec);

Additional types can be handled with code like

  if (scm_from_bool (scm_procedure_p (val)))
    ...

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread