unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Mike Gran <spk121@yahoo.com>
Cc: guile-devel@gnu.org
Subject: Re: Wide string strategies
Date: Fri, 10 Apr 2009 09:57:26 +0200	[thread overview]
Message-ID: <87r601aql5.fsf@gnu.org> (raw)
In-Reply-To: <1239334758.7191.104.camel@localhost.localdomain> (Mike Gran's message of "Thu, 09 Apr 2009 20:39:18 -0700")

Hi Mike,

Mike Gran <spk121@yahoo.com> writes:

> On Thu, 2009-04-09 at 22:25 +0200, Ludovic Courtès wrote: 

>> All the POSIX interface needs fast access to ASCII strings.  How about
>> something like:
>> 
>>   const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout));
>> 
>> where `scm_i_ascii_symbol_chars ()' throws an exception if its argument
>> is a non-ASCII symbol?
>> 
>> This would mean special-casing ASCII stringbufs so that we can treat
>> them as C strings.
>
> OK.  Fast ASCII strings for the evaluator and for POSIX should be easy
> enough.  Are there any other modules that definitely require fast
> strings?

None that I can think of.

Actually, for the file system interface, for instance, it's even
trickier: the encoding of file names usually isn't specified, but some
apps/libraries have their opinion on that, e.g., Glib
(http://library.gnome.org/devel/glib/unstable/glib-File-Utilities.html).
We should probably follow their lead here, but that's a secondary
problem anyway.

> Also, the interaction between strings and sockets needs more thought.
> If sendto and recvfrom are used for datagram transmission, as it
> suggests in their docstrings, then locale string conversion could be a
> bad idea.  (And, these functions should also operate on u8vectors, but
> that's another issue.)

Agreed.

> To be more general, I know some apps depend on 8-bit strings and use
> them as storage of non-string binary data.

Yes, notably because of `sendto' et al. that take a string.

> I think SND falls into this
> category.  I wonder if ultimately wide strings would have to be a
> run-time option that is off by default.  But I am (choose your English
> idiom here) getting ahead of myself, or jumping the gun, or putting the
> cart before the horse.

I don't have any idea of how we could usefully handle that.

Eventually, it may be a good idea to deprecate `(sento "foobar")' in
favor of a variant that takes a bytevector or some such.

>> > +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c);
>> 
>> Does it assume sizeof (int) >= 32 ?
>
> I suppose it does.  But, I only used it to compare to the output of
> scm_getc which also returns an int.

I meant, is the intent that C contains a codepoint?

>> > +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str);
>> > +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str);
>> > +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str);
>> > +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str);
>> > +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str);
>> 
>> What does "sz" mean?
>
> Back in the day, "sz" was Microsoft-speak for the pointer to the first
> character of a null-terminated char string.  By not knowing that, you
> have demonstrated that you remain unpolluted. ;-) I probably was trying
> to avoid writing "scm_i_string_to_string."

Ouch, I *think* I had seen it in some places but never knew where it
comes from.  :-)

How about:

  SCM scm_i_from_ascii_string (const scm_t_uint8 *str);

and similar?

>> 
>> > +/* For ASCII strings, SUB can be used to represent an invalid
>> > +   character.  */
>> > +#define SCM_SUB ('\x1A')
>> 
>> Why SUB?  How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or
>> similar?
>
> If you're asking why SUB is set to 0x1A, the standard EMCA-48 says 0x1A
> should be used to indicate an invalid ASCII character.

I suspected that.  Then `SCM_I_SUB_CHAR' may be a good name, perhaps
with a comment saying that this is the "official SUB character".

Thanks!

Ludo'.




  reply	other threads:[~2009-04-10  7:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-09 15:00 Wide string strategies Mike Gran
2009-04-09 20:25 ` Ludovic Courtès
2009-04-10  3:39   ` Mike Gran
2009-04-10  7:57     ` Ludovic Courtès [this message]
2009-04-10 17:14       ` Mike Gran
2009-04-14  7:45         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r601aql5.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=spk121@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).