* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
[not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org>
@ 2009-06-05 14:10 ` Ludovic Courtès
2009-06-05 14:26 ` Mike Gran
0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-05 14:10 UTC (permalink / raw)
To: Michael Gran; +Cc: guile-devel
Hi Mike,
A few random thoughts:
"Michael Gran" <spk121@yahoo.com> writes:
> - buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len);
> + buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len,
> + pt->encoding, pt->ilseq_handler);
I'd call that `scm_to_stringn ()' since it's the most generic form (and
a string is always "encoded", anyway).
> +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0,
> + (SCM enc, SCM port),
How about `set-port-encoding!' (for consistency with other procedure
names), with PORT being a required argument?
> "Sets the character encoding that will be used to interpret all\n"
> - "port I/O. Normally, one would set this using @code{setlocale},\n"
> + "port I/O. Normally, a new port would inherit the encoding\n"
> + "set by using @code{setlocale},\n"
It would seem simpler to me if a port's encoding defaulted to ASCII,
instead of the current locale's encoding. That would make semantics
clearer and easier to follow. What do you think?
> +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0,
> + (SCM port),
Likewise, `set-port-binary-mode!' or some such.
> +char *
> +scm_scan_for_encoding (SCM port)
Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'.
In addition, from a memory management viewpoint, it might be easier to
have it return an `SCM'.
> -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!",
I'm wondering whether this should be a per-port (eventually,
per-transcoder) setting. What's your opinion?
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès
@ 2009-06-05 14:26 ` Mike Gran
2009-06-06 13:23 ` Ludovic Courtès
0 siblings, 1 reply; 5+ messages in thread
From: Mike Gran @ 2009-06-05 14:26 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guile Devel
On Fri, 2009-06-05 at 16:10 +0200, Ludovic Courtès wrote:
> Hi Mike,
>
> A few random thoughts:
>
> "Michael Gran" <spk121@yahoo.com> writes:
>
> > - buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len);
> > + buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len,
> > + pt->encoding, pt->ilseq_handler);
>
> I'd call that `scm_to_stringn ()' since it's the most generic form (and
> a string is always "encoded", anyway).
OK
>
> > +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0,
> > + (SCM enc, SCM port),
>
> How about `set-port-encoding!' (for consistency with other procedure
> names), with PORT being a required argument?
>
> > "Sets the character encoding that will be used to interpret all\n"
> > - "port I/O. Normally, one would set this using @code{setlocale},\n"
> > + "port I/O. Normally, a new port would inherit the encoding\n"
> > + "set by using @code{setlocale},\n"
>
> It would seem simpler to me if a port's encoding defaulted to ASCII,
> instead of the current locale's encoding. That would make semantics
> clearer and easier to follow. What do you think?
>
It would make things easier to follow, but, pure 7-bit ASCII would hurt
backwards compatibility. The libunistring conversion funcs do raise
errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be
better so that 8-bit chars wouldn't cause errors by default.
Also, I guess setlocale is where one should modify the encodings of
current-input-port, current-output-port and current-error-port,
since they need special handling.
> > +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0,
> > + (SCM port),
>
> Likewise, `set-port-binary-mode!' or some such.
>
> > +char *
> > +scm_scan_for_encoding (SCM port)
>
> Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'.
>
OK
> In addition, from a memory management viewpoint, it might be easier to
> have it return an `SCM'.
OK
>
> > -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!",
>
> I'm wondering whether this should be a per-port (eventually,
> per-transcoder) setting. What's your opinion?
I believe that is how it should work. I'm working toward that.
>
> Thanks,
> Ludo'.
Thanks,
Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
2009-06-05 14:26 ` Mike Gran
@ 2009-06-06 13:23 ` Ludovic Courtès
2009-06-08 4:51 ` Mike Gran
0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-06 13:23 UTC (permalink / raw)
To: guile-devel
Hi Mike,
Mike Gran <spk121@yahoo.com> writes:
> It would make things easier to follow, but, pure 7-bit ASCII would hurt
> backwards compatibility. The libunistring conversion funcs do raise
> errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be
> better so that 8-bit chars wouldn't cause errors by default.
Right, Latin-1 would be saner.
> Also, I guess setlocale is where one should modify the encodings of
> current-input-port, current-output-port and current-error-port,
> since they need special handling.
These could be Latin-1 when they are created, just like any other ports,
and soon after they would be switched to the current locale's encoding.
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
2009-06-06 13:23 ` Ludovic Courtès
@ 2009-06-08 4:51 ` Mike Gran
2009-06-18 20:18 ` Ludovic Courtès
0 siblings, 1 reply; 5+ messages in thread
From: Mike Gran @ 2009-06-08 4:51 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guile Devel
On Sat, 2009-06-06 at 15:23 +0200, Ludovic Courtès wrote:
> Hi Mike,
>
> Mike Gran <spk121@yahoo.com> writes:
>
> > It would make things easier to follow, but, pure 7-bit ASCII would hurt
> > backwards compatibility. The libunistring conversion funcs do raise
> > errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be
> > better so that 8-bit chars wouldn't cause errors by default.
>
> Right, Latin-1 would be saner.
>
Setting a port's default encoding to Latin-1 doesn't work out so well in
practice. For example, ports are used as the backend of procedures like
with-input-from-file and with-output-to-string. Those procedures don't
currently take any encoding information and presume some sort of default
encoding.
Once could easily imagine a case where the locale is set to en_US.UTF-8
and then with-input-from-file is called. If non-Latin-1 characters
appear in the file, the port will throw a conversion error. I think
that would violate the principle of lease surprise.
I prefer having a port inherit its default encoding from the last call
to setlocale. This isn't a violation of R6RS Port I/O, since it states
that the "native" transcoding may be both implementation dependent and
locale-dependent.
Less preferable, IMHO, is to modify all the with-input-from-* and
with-output-to-* procedures to take optional explicit encodings.
Thanks,
Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
2009-06-08 4:51 ` Mike Gran
@ 2009-06-18 20:18 ` Ludovic Courtès
0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2009-06-18 20:18 UTC (permalink / raw)
To: guile-devel
Hello!
Mike Gran <spk121@yahoo.com> writes:
> Setting a port's default encoding to Latin-1 doesn't work out so well in
> practice. For example, ports are used as the backend of procedures like
> with-input-from-file and with-output-to-string. Those procedures don't
> currently take any encoding information and presume some sort of default
> encoding.
Ooh, right.
> I prefer having a port inherit its default encoding from the last call
> to setlocale.
Inherit from the current locale, yes, that sounds preferable.
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-06-18 20:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org>
2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès
2009-06-05 14:26 ` Mike Gran
2009-06-06 13:23 ` Ludovic Courtès
2009-06-08 4:51 ` Mike Gran
2009-06-18 20:18 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).