* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 [not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org> @ 2009-06-05 14:10 ` Ludovic Courtès 2009-06-05 14:26 ` Mike Gran 0 siblings, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2009-06-05 14:10 UTC (permalink / raw) To: Michael Gran; +Cc: guile-devel Hi Mike, A few random thoughts: "Michael Gran" <spk121@yahoo.com> writes: > - buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len); > + buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len, > + pt->encoding, pt->ilseq_handler); I'd call that `scm_to_stringn ()' since it's the most generic form (and a string is always "encoded", anyway). > +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0, > + (SCM enc, SCM port), How about `set-port-encoding!' (for consistency with other procedure names), with PORT being a required argument? > "Sets the character encoding that will be used to interpret all\n" > - "port I/O. Normally, one would set this using @code{setlocale},\n" > + "port I/O. Normally, a new port would inherit the encoding\n" > + "set by using @code{setlocale},\n" It would seem simpler to me if a port's encoding defaulted to ASCII, instead of the current locale's encoding. That would make semantics clearer and easier to follow. What do you think? > +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0, > + (SCM port), Likewise, `set-port-binary-mode!' or some such. > +char * > +scm_scan_for_encoding (SCM port) Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'. In addition, from a memory management viewpoint, it might be easier to have it return an `SCM'. > -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!", I'm wondering whether this should be a per-port (eventually, per-transcoder) setting. What's your opinion? Thanks, Ludo'. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès @ 2009-06-05 14:26 ` Mike Gran 2009-06-06 13:23 ` Ludovic Courtès 0 siblings, 1 reply; 5+ messages in thread From: Mike Gran @ 2009-06-05 14:26 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guile Devel On Fri, 2009-06-05 at 16:10 +0200, Ludovic Courtès wrote: > Hi Mike, > > A few random thoughts: > > "Michael Gran" <spk121@yahoo.com> writes: > > > - buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len); > > + buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len, > > + pt->encoding, pt->ilseq_handler); > > I'd call that `scm_to_stringn ()' since it's the most generic form (and > a string is always "encoded", anyway). OK > > > +SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0, > > + (SCM enc, SCM port), > > How about `set-port-encoding!' (for consistency with other procedure > names), with PORT being a required argument? > > > "Sets the character encoding that will be used to interpret all\n" > > - "port I/O. Normally, one would set this using @code{setlocale},\n" > > + "port I/O. Normally, a new port would inherit the encoding\n" > > + "set by using @code{setlocale},\n" > > It would seem simpler to me if a port's encoding defaulted to ASCII, > instead of the current locale's encoding. That would make semantics > clearer and easier to follow. What do you think? > It would make things easier to follow, but, pure 7-bit ASCII would hurt backwards compatibility. The libunistring conversion funcs do raise errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be better so that 8-bit chars wouldn't cause errors by default. Also, I guess setlocale is where one should modify the encodings of current-input-port, current-output-port and current-error-port, since they need special handling. > > +SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0, > > + (SCM port), > > Likewise, `set-port-binary-mode!' or some such. > > > +char * > > +scm_scan_for_encoding (SCM port) > > Since it's `SCM_INTERNAL', I'd suggest `scm_i_scan_for_encoding()'. > OK > In addition, from a memory management viewpoint, it might be easier to > have it return an `SCM'. OK > > > -SCM_DEFINE (scm_set_conversion_error_behavior_x, "set-conversion-error-behavior!", > > I'm wondering whether this should be a per-port (eventually, > per-transcoder) setting. What's your opinion? I believe that is how it should work. I'm working toward that. > > Thanks, > Ludo'. Thanks, Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 2009-06-05 14:26 ` Mike Gran @ 2009-06-06 13:23 ` Ludovic Courtès 2009-06-08 4:51 ` Mike Gran 0 siblings, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2009-06-06 13:23 UTC (permalink / raw) To: guile-devel Hi Mike, Mike Gran <spk121@yahoo.com> writes: > It would make things easier to follow, but, pure 7-bit ASCII would hurt > backwards compatibility. The libunistring conversion funcs do raise > errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be > better so that 8-bit chars wouldn't cause errors by default. Right, Latin-1 would be saner. > Also, I guess setlocale is where one should modify the encodings of > current-input-port, current-output-port and current-error-port, > since they need special handling. These could be Latin-1 when they are created, just like any other ports, and soon after they would be switched to the current locale's encoding. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 2009-06-06 13:23 ` Ludovic Courtès @ 2009-06-08 4:51 ` Mike Gran 2009-06-18 20:18 ` Ludovic Courtès 0 siblings, 1 reply; 5+ messages in thread From: Mike Gran @ 2009-06-08 4:51 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guile Devel On Sat, 2009-06-06 at 15:23 +0200, Ludovic Courtès wrote: > Hi Mike, > > Mike Gran <spk121@yahoo.com> writes: > > > It would make things easier to follow, but, pure 7-bit ASCII would hurt > > backwards compatibility. The libunistring conversion funcs do raise > > errors when 8-bit chars are converted into ASCII. ISO-8859-1 could be > > better so that 8-bit chars wouldn't cause errors by default. > > Right, Latin-1 would be saner. > Setting a port's default encoding to Latin-1 doesn't work out so well in practice. For example, ports are used as the backend of procedures like with-input-from-file and with-output-to-string. Those procedures don't currently take any encoding information and presume some sort of default encoding. Once could easily imagine a case where the locale is set to en_US.UTF-8 and then with-input-from-file is called. If non-Latin-1 characters appear in the file, the port will throw a conversion error. I think that would violate the principle of lease surprise. I prefer having a port inherit its default encoding from the last call to setlocale. This isn't a violation of R6RS Port I/O, since it states that the "native" transcoding may be both implementation dependent and locale-dependent. Less preferable, IMHO, is to modify all the with-input-from-* and with-output-to-* procedures to take optional explicit encodings. Thanks, Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 2009-06-08 4:51 ` Mike Gran @ 2009-06-18 20:18 ` Ludovic Courtès 0 siblings, 0 replies; 5+ messages in thread From: Ludovic Courtès @ 2009-06-18 20:18 UTC (permalink / raw) To: guile-devel Hello! Mike Gran <spk121@yahoo.com> writes: > Setting a port's default encoding to Latin-1 doesn't work out so well in > practice. For example, ports are used as the backend of procedures like > with-input-from-file and with-output-to-string. Those procedures don't > currently take any encoding information and presume some sort of default > encoding. Ooh, right. > I prefer having a port inherit its default encoding from the last call > to setlocale. Inherit from the current locale, yes, that sounds preferable. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-06-18 20:18 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <E1MCZex-00070s-SB@cvs.savannah.gnu.org> 2009-06-05 14:10 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 Ludovic Courtès 2009-06-05 14:26 ` Mike Gran 2009-06-06 13:23 ` Ludovic Courtès 2009-06-08 4:51 ` Mike Gran 2009-06-18 20:18 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).