* The “binary-friendly” Latin-1
@ 2011-01-24 22:26 Ludovic Courtès
0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2011-01-24 22:26 UTC (permalink / raw)
To: guile-devel
Hello!
Do we really want to keep:
1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
actually mostly gone with the iconv change, since every textual
access goes through iconv. For binary accesses, the right API is
(rnrs io ports) or similar.
2. The #f <=> "ISO-8859-1" equivalence for ‘port-encoding’ and
‘set-port-encoding!’. Likewise, commit
d9544bf012b6e343c80b76bd5761b1583cc106a3 makes ‘port-encoding’
always return a string and pt->encoding always be non-NULL.
Sorry for questioning this now, but these are important questions, I
think.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
@ 2011-01-24 23:21 Mike Gran
2011-01-25 10:19 ` Hans Aberg
2011-01-25 13:21 ` Ludovic Courtès
0 siblings, 2 replies; 9+ messages in thread
From: Mike Gran @ 2011-01-24 23:21 UTC (permalink / raw)
To: Ludovic Courtès, guile-devel@gnu.org
> From:Ludovic Courtès <ludo@gnu.org>
> To:guile-devel@gnu.org
> Cc:
> Sent:Monday, January 24, 2011 2:26 PM
> Subject:The “binary-friendly” Latin-1
>
> Hello!
>
> Do we really want to keep:
>
> 1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
> actually mostly gone with the iconv change, since every textual
> access goes through iconv. For binary accesses, the right API is
> (rnrs io ports) or similar.
An equivalent question is if you care about backward compatibility of
legacy ports. Legacy ports returned strings and were once the only option.
I think it is a bad idea if you are replacing one non-RNRS port system
with another non-RNRS port system. It is a less bad if you are replacing
non-RNRS ports with RNRS ports, assuming, of course that R7RS doesn't just
invent yet another port system.
>
> 2. The #f <=> "ISO-8859-1" equivalence for ‘port-encoding’ and
> ‘set-port-encoding!’. Likewise, commit
> d9544bf012b6e343c80b76bd5761b1583cc106a3 makes ‘port-encoding’
> always return a string and pt->encoding always be non-NULL.
Is the cost of doing the various string comparisons of port-encoding
strings negligible? It was put in as a (premature) optimization.
>
> Sorry for questioning this now, but these are important questions, I
> think.
Indeed.
>
> Thanks,
> Ludo’.
Thanks,
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
2011-01-24 23:21 The “binary-friendly” Latin-1 Mike Gran
@ 2011-01-25 10:19 ` Hans Aberg
2011-01-25 13:21 ` Ludovic Courtès
1 sibling, 0 replies; 9+ messages in thread
From: Hans Aberg @ 2011-01-25 10:19 UTC (permalink / raw)
To: Mike Gran; +Cc: Ludovic Courtès, guile-devel@gnu.org
On 25 Jan 2011, at 00:21, Mike Gran wrote:
>> 2. The #f <=> "ISO-8859-1" equivalence for ‘port-encoding’ and
>> ‘set-port-encoding!’. Likewise, commit
>> d9544bf012b6e343c80b76bd5761b1583cc106a3 makes ‘port-encoding’
>> always return a string and pt->encoding always be non-NULL.
>
> Is the cost of doing the various string comparisons of port-encoding
> strings negligible? It was put in as a (premature) optimization.
If you decide to keep it, an idea that comes to my mind add #t for
UTF-8, which seems a better default nowadays and in the future.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
2011-01-24 23:21 The “binary-friendly” Latin-1 Mike Gran
2011-01-25 10:19 ` Hans Aberg
@ 2011-01-25 13:21 ` Ludovic Courtès
1 sibling, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2011-01-25 13:21 UTC (permalink / raw)
To: Mike Gran; +Cc: guile-devel@gnu.org
Hello!
>> 1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
>> actually mostly gone with the iconv change, since every textual
>> access goes through iconv. For binary accesses, the right API is
>> (rnrs io ports) or similar.
>
> An equivalent question is if you care about backward compatibility of
> legacy ports. Legacy ports returned strings and were once the only option.
You mean if there’s legacy code using a port of unspecified encoding to
read binary data, right?
The iconv change doesn’t break it on GNU/Linux:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define p (open-bytevector-input-port #vu8(0 1 2 3 255 128)))
scheme@(guile-user)> (set-port-encoding! p "ISO-8859-1")
scheme@(guile-user)> (read-char p)
$14 = #\nul
scheme@(guile-user)> (read-char p)
$15 = #\soh
scheme@(guile-user)> (read-char p)
$16 = #\stx
scheme@(guile-user)> (read-char p)
$17 = #\etx
scheme@(guile-user)> (read-char p)
$18 = #\ÿ
scheme@(guile-user)> (read-char p)
$19 = #\200
scheme@(guile-user)> (read-char p)
$20 = #<eof>
--8<---------------cut here---------------end--------------->8---
However, an iconv implementation may be free to choke on anything that’s
not strictly Latin-1 per
<https://secure.wikimedia.org/wikipedia/en/wiki/ISO-8859-1#Codepage_layout>,
e.g., everything but “ÿ” in the example above, but that seems highly
unlikely.
Anyway, as soon as you use a non-Latin-1 locale, ports get opened under
that locale’s encoding, which practically makes it impossible to do
binary I/O on the ports.
>> 2. The #f <=> "ISO-8859-1" equivalence for ‘port-encoding’ and
>> ‘set-port-encoding!’. Likewise, commit
>> d9544bf012b6e343c80b76bd5761b1583cc106a3 makes ‘port-encoding’
>> always return a string and pt->encoding always be non-NULL.
>
> Is the cost of doing the various string comparisons of port-encoding
> strings negligible? It was put in as a (premature) optimization.
The new code keeps open iconv conversion descriptors for each port and
re-uses them; the only use of pt->encoding is when opening those CDs.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
@ 2011-01-25 14:01 Mike Gran
2011-01-25 19:58 ` Ludovic Courtès
0 siblings, 1 reply; 9+ messages in thread
From: Mike Gran @ 2011-01-25 14:01 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel@gnu.org
> From:Ludovic Courtès <ludo@gnu.org>
> Hello!
>
> >> 1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
> >> actually mostly gone with the iconv change, since every textual
> >> access goes through iconv. For binary accesses, the right API is
> >> (rnrs io ports) or similar.
> >
> > An equivalent question is if you care about backward compatibility of
> > legacy ports. Legacy ports returned strings and were once the only option.
>
> You mean if there’s legacy code using a port of unspecified encoding to
> read binary data, right?
>
> The iconv change doesn’t break it on GNU/Linux:
Cool. Have you considered what you would want to do with
the 'recv!' procedure?
[...]
> > Is the cost of doing the various string comparisons of port-encoding
> > strings negligible? It was put in as a (premature) optimization.
>
> The new code keeps open iconv conversion descriptors for each port and
> re-uses them; the only use of pt->encoding is when opening those CDs.
Sounds good.
Thanks,
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
2011-01-25 14:01 Mike Gran
@ 2011-01-25 19:58 ` Ludovic Courtès
0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2011-01-25 19:58 UTC (permalink / raw)
To: Mike Gran; +Cc: guile-devel@gnu.org
Hi!
Mike Gran <spk121@yahoo.com> writes:
>> From:Ludovic Courtès <ludo@gnu.org>
>
>> Hello!
>>
>> >> 1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
>> >> actually mostly gone with the iconv change, since every textual
>> >> access goes through iconv. For binary accesses, the right API is
>> >> (rnrs io ports) or similar.
>> >
>> > An equivalent question is if you care about backward compatibility of
>> > legacy ports. Legacy ports returned strings and were once the only option.
>>
>> You mean if there’s legacy code using a port of unspecified encoding to
>> read binary data, right?
>>
>> The iconv change doesn’t break it on GNU/Linux:
>
> Cool. Have you considered what you would want to do with
> the 'recv!' procedure?
Hmm no. Ideas?
Perhaps the second argument could be changed to be a string, in which
case it would issue a deprecation warning, or a bytevector. But when
it’s a string, it’s bound to break unless the program explicitly chooses
a Latin-1 encoding.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
@ 2011-01-26 18:31 Mike Gran
2011-01-28 11:15 ` Andy Wingo
0 siblings, 1 reply; 9+ messages in thread
From: Mike Gran @ 2011-01-26 18:31 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel@gnu.org
> >
> > Cool. Have you considered what you would want to do with
> > the 'recv!' procedure?
>
> Hmm no. Ideas?
>
> Perhaps the second argument could be changed to be a string, in which
> case it would issue a deprecation warning, or a bytevector. But when
> it’s a string, it’s bound to break unless the program explicitly chooses
> a Latin-1 encoding.
recv, send, etc are clearly bytevector routines. But, if you want to keep
backward compatibility, you should have it handle both cases.
IMHO, the idea of deprecating the use of strings is the wrong one. Either
be bold and get rid of strings for 2.0, or let them be both bytevector
and string functions for the foreseeable future.
If strings remain an option, there will have to be some mention of the
"binary-friendly" Latin-1 encoding. ;-)
Also, if bytevectors are going to be used for these things, I think
you might consider have (rnrs bytevectors) be loaded by default for 2.0.
>
> Thanks,
> Ludo’.
Thanks,
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
2011-01-26 18:31 Mike Gran
@ 2011-01-28 11:15 ` Andy Wingo
2011-01-29 14:29 ` Ludovic Courtès
0 siblings, 1 reply; 9+ messages in thread
From: Andy Wingo @ 2011-01-28 11:15 UTC (permalink / raw)
To: Mike Gran; +Cc: Ludovic Courtès, guile-devel@gnu.org
On Wed 26 Jan 2011 19:31, Mike Gran <spk121@yahoo.com> writes:
> recv, send, etc are clearly bytevector routines. But, if you want to keep
> backward compatibility, you should have it handle both cases.
>
> IMHO, the idea of deprecating the use of strings is the wrong one. Either
> be bold and get rid of strings for 2.0, or let them be both bytevector
> and string functions for the foreseeable future.
>
> If strings remain an option, there will have to be some mention of the
> "binary-friendly" Latin-1 encoding. ;-)
I think recv and send should operate on bytevectors.
I also think that if Guile is compiled with deprecated code enabled,
strings should be supported, noting that the send message should be a
latin-1 string, and that received bytes will be interpreted as latin-1
characters. If passed a string, these functions will issue deprecation
warnings. We change the documentation to speak of bytevectors and not
strings.
This way we provide the correct interface, while not breaking old code,
instead indicating the action that people should take to adapt their
code.
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: The “binary-friendly” Latin-1
2011-01-28 11:15 ` Andy Wingo
@ 2011-01-29 14:29 ` Ludovic Courtès
0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2011-01-29 14:29 UTC (permalink / raw)
To: guile-devel
Hi Andy!
Andy Wingo <wingo@pobox.com> writes:
> On Wed 26 Jan 2011 19:31, Mike Gran <spk121@yahoo.com> writes:
>
>> recv, send, etc are clearly bytevector routines. But, if you want to keep
>> backward compatibility, you should have it handle both cases.
>>
>> IMHO, the idea of deprecating the use of strings is the wrong one. Either
>> be bold and get rid of strings for 2.0, or let them be both bytevector
>> and string functions for the foreseeable future.
>>
>> If strings remain an option, there will have to be some mention of the
>> "binary-friendly" Latin-1 encoding. ;-)
>
> I think recv and send should operate on bytevectors.
>
> I also think that if Guile is compiled with deprecated code enabled,
> strings should be supported, noting that the send message should be a
> latin-1 string, and that received bytes will be interpreted as latin-1
> characters. If passed a string, these functions will issue deprecation
> warnings. We change the documentation to speak of bytevectors and not
> strings.
I concur, I’ll do this.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-01-29 14:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-24 23:21 The “binary-friendly” Latin-1 Mike Gran
2011-01-25 10:19 ` Hans Aberg
2011-01-25 13:21 ` Ludovic Courtès
-- strict thread matches above, loose matches on Subject: below --
2011-01-26 18:31 Mike Gran
2011-01-28 11:15 ` Andy Wingo
2011-01-29 14:29 ` Ludovic Courtès
2011-01-25 14:01 Mike Gran
2011-01-25 19:58 ` Ludovic Courtès
2011-01-24 22:26 Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).