unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
       [not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org>
@ 2009-06-05 14:10 ` Ludovic Courtès
  2009-06-05 14:26   ` Mike Gran
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-05 14:10 UTC (permalink / raw)
  To: Michael Gran; +Cc: guile-devel

Hi Mike,

A few random thoughts:

"Michael Gran" <spk121@yahoo.com> writes:

> -  buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len);
> +  buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len, 
> +				pt->encoding, pt->ilseq_handler);

I'd call that `scm_to_stringn ()' since it's the most generic form (and
a string is always "encoded", anyway).

> +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0,
> +	    (SCM enc, SCM port),

How about `set-port-encoding!' (for consistency with other procedure
names), with PORT being a required argument?

>  	    "Sets the character encoding that will be used to interpret all\n"
> -	    "port I/O.  Normally, one would set this using @code{setlocale},\n"
> +	    "port I/O.  Normally, a new port would inherit the encoding\n"
> +	    "set by using @code{setlocale},\n"

It would seem simpler to me if a port's encoding defaulted to ASCII,
instead of the current locale's encoding.  That would make semantics
clearer and easier to follow.  What do you think?

> +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0,
> +	    (SCM port),

Likewise, `set-port-binary-mode!' or some such.

> +char *
> +scm_scan_for_encoding (SCM port)

Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'.

In addition, from a memory management viewpoint, it might be easier to
have it return an `SCM'.

> -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!",

I'm wondering whether this should be a per-port (eventually,
per-transcoder) setting.  What's your opinion?

Thanks,
Ludo'.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
  2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès
@ 2009-06-05 14:26   ` Mike Gran
  2009-06-06 13:23     ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Gran @ 2009-06-05 14:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guile Devel

On Fri, 2009-06-05 at 16:10 +0200, Ludovic Courtès wrote:
> Hi Mike,
> 
> A few random thoughts:
> 
> "Michael Gran" <spk121@yahoo.com> writes:
> 
> > -  buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len);
> > +  buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len, 
> > +				pt->encoding, pt->ilseq_handler);
> 
> I'd call that `scm_to_stringn ()' since it's the most generic form (and
> a string is always "encoded", anyway).

OK

> 
> > +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0,
> > +	    (SCM enc, SCM port),
> 
> How about `set-port-encoding!' (for consistency with other procedure
> names), with PORT being a required argument?
> 
> >  	    "Sets the character encoding that will be used to interpret all\n"
> > -	    "port I/O.  Normally, one would set this using @code{setlocale},\n"
> > +	    "port I/O.  Normally, a new port would inherit the encoding\n"
> > +	    "set by using @code{setlocale},\n"
> 
> It would seem simpler to me if a port's encoding defaulted to ASCII,
> instead of the current locale's encoding.  That would make semantics
> clearer and easier to follow.  What do you think?
> 

It would make things easier to follow, but, pure 7-bit ASCII would hurt
backwards compatibility.  The libunistring conversion funcs do raise
errors when 8-bit chars are converted into ASCII.  ISO-8859-1 could be
better so that 8-bit chars wouldn't cause errors by default.

Also, I guess setlocale is where one should modify the encodings of
current-input-port, current-output-port and current-error-port,
since they need special handling.

> > +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0,
> > +	    (SCM port),
> 
> Likewise, `set-port-binary-mode!' or some such.
> 
> > +char *
> > +scm_scan_for_encoding (SCM port)
> 
> Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'.
> 

OK 

> In addition, from a memory management viewpoint, it might be easier to
> have it return an `SCM'.

OK

> 
> > -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!",
> 
> I'm wondering whether this should be a per-port (eventually,
> per-transcoder) setting.  What's your opinion?

I believe that is how it should work.  I'm working toward that.

> 
> Thanks,
> Ludo'.

Thanks,

Mike




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
  2009-06-05 14:26   ` Mike Gran
@ 2009-06-06 13:23     ` Ludovic Courtès
  2009-06-08  4:51       ` Mike Gran
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-06 13:23 UTC (permalink / raw)
  To: guile-devel

Hi Mike,

Mike Gran <spk121@yahoo.com> writes:

> It would make things easier to follow, but, pure 7-bit ASCII would hurt
> backwards compatibility.  The libunistring conversion funcs do raise
> errors when 8-bit chars are converted into ASCII.  ISO-8859-1 could be
> better so that 8-bit chars wouldn't cause errors by default.

Right, Latin-1 would be saner.

> Also, I guess setlocale is where one should modify the encodings of
> current-input-port, current-output-port and current-error-port,
> since they need special handling.

These could be Latin-1 when they are created, just like any other ports,
and soon after they would be switched to the current locale's encoding.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
  2009-06-06 13:23     ` Ludovic Courtès
@ 2009-06-08  4:51       ` Mike Gran
  2009-06-18 20:18         ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Gran @ 2009-06-08  4:51 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guile Devel

On Sat, 2009-06-06 at 15:23 +0200, Ludovic Courtès wrote:
> Hi Mike,
> 
> Mike Gran <spk121@yahoo.com> writes:
> 
> > It would make things easier to follow, but, pure 7-bit ASCII would hurt
> > backwards compatibility.  The libunistring conversion funcs do raise
> > errors when 8-bit chars are converted into ASCII.  ISO-8859-1 could be
> > better so that 8-bit chars wouldn't cause errors by default.
> 
> Right, Latin-1 would be saner.
> 

Setting a port's default encoding to Latin-1 doesn't work out so well in
practice.  For example, ports are used as the backend of procedures like
with-input-from-file and with-output-to-string.  Those procedures don't
currently take any encoding information and presume some sort of default
encoding.

Once could easily imagine a case where the locale is set to en_US.UTF-8
and then with-input-from-file is called.  If non-Latin-1 characters
appear in the file, the port will throw a conversion error.  I think
that would violate the principle of lease surprise.

I prefer having a port inherit its default encoding from the last call
to setlocale.  This isn't a violation of R6RS Port I/O, since it states
that the "native" transcoding may be both implementation dependent and
locale-dependent.

Less preferable, IMHO, is to modify all the with-input-from-* and
with-output-to-* procedures to take optional explicit encodings.

Thanks,

Mike 




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
  2009-06-08  4:51       ` Mike Gran
@ 2009-06-18 20:18         ` Ludovic Courtès
  0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-18 20:18 UTC (permalink / raw)
  To: guile-devel

Hello!

Mike Gran <spk121@yahoo.com> writes:

> Setting a port's default encoding to Latin-1 doesn't work out so well in
> practice.  For example, ports are used as the backend of procedures like
> with-input-from-file and with-output-to-string.  Those procedures don't
> currently take any encoding information and presume some sort of default
> encoding.

Ooh, right.

> I prefer having a port inherit its default encoding from the last call
> to setlocale.

Inherit from the current locale, yes, that sounds preferable.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-18 20:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org>
2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès
2009-06-05 14:26   ` Mike Gran
2009-06-06 13:23     ` Ludovic Courtès
2009-06-08  4:51       ` Mike Gran
2009-06-18 20:18         ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).