Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
       [not found] <E1M8eXe-00072c-07@cvs.savannah.gnu.org>
@ 2009-05-25 22:22 ` Ludovic Courtès
  2009-05-25 23:14   ` Mike Gran
  2009-05-26 21:24   ` Andy Wingo
  0 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2009-05-25 22:22 UTC (permalink / raw)
  To: Michael Gran; +Cc: guile-devel

Hello,

"Michael Gran" <spk121@yahoo.com> writes:

>     add tests for encoding/decoding wide strings

Nice!

Just a bit of cosmetic nitpicking:

>     	* test-suite/tests/encoding_utf8.test: new

Please use hyphens instead of underscores in file names, for
consistency.

> +(setlocale LC_ALL "en_US.utf8")

[...]

> +(setencoding "ASCII")

[...]

> +(setencoding "ISO-8859-7")

Do these modify the encoding used by the underlying port?  If so, I'd
rather explicitly use a fluid, as is done for `current-reader'.

However, this relies on eval-after-read semantics.  That is, if the
whole file is read at once, *then* evaluated, that won't work, right?

> +(setlocale LC_ALL "es_MX.ISO-8859-1")

Not everyone has this locale.  ;-)

> +(with-test-prefix
> + "internal encoding"
> +
> + (pass-if "ultima"
> +	  (string=? s1 (string-ints #xfa #x6c #x74 #x69 #x6d #x61)))

Please indent as is done in other files.

Thanks!
Ludo'.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès
@ 2009-05-25 23:14   ` Mike Gran
  2009-05-26 21:24   ` Andy Wingo
  1 sibling, 0 replies; 10+ messages in thread
From: Mike Gran @ 2009-05-25 23:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guile Devel

On Tue, 2009-05-26 at 00:22 +0200, Ludovic Courtès wrote:
> Hello,
> 
> Just a bit of cosmetic nitpicking:
> 
> >     	* test-suite/tests/encoding_utf8.test: new
> 
> Please use hyphens instead of underscores in file names, for
> consistency.

OK

> 
> > +(setlocale LC_ALL "en_US.utf8")
> 
> [...]
> 
> > +(setencoding "ASCII")
> 
> [...]
> 
> > +(setencoding "ISO-8859-7")
> 
> Do these modify the encoding used by the underlying port?  If so, I'd
> rather explicitly use a fluid, as is done for `current-reader'.

For now, I have only one global port encoding.  So setlocale,
setencoding modify all subsequent port I/O.

> 
> However, this relies on eval-after-read semantics.  That is, if the
> whole file is read at once, *then* evaluated, that won't work, right?

The reader needs to know the encoding of a file by reading the file.
The way I have it set up right now source gets evaluated sequentially
and the reader changes encoding when setlocale or setencoding is
encountered.  Kludgy, but, simple.

I think Python sets the source encoding using a magic comment.  That's
another way to go: have the reader scan the comment blocks for a magic
comment before trying to evaluate the file.

> 
> > +(setlocale LC_ALL "es_MX.ISO-8859-1")
> 
> Not everyone has this locale.  ;-)

Viva la raza!

> 
> > +(with-test-prefix
> > + "internal encoding"
> > +
> > + (pass-if "ultima"
> > +	  (string=? s1 (string-ints #xfa #x6c #x74 #x69 #x6d #x61)))
> 
> Please indent as is done in other files.

OK

> 
> Thanks!
> Ludo'.

Thanks,

Mike




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès
  2009-05-25 23:14   ` Mike Gran
@ 2009-05-26 21:24   ` Andy Wingo
  2009-05-26 22:26     ` Mike Gran
  1 sibling, 1 reply; 10+ messages in thread
From: Andy Wingo @ 2009-05-26 21:24 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Tue 26 May 2009 00:22, ludo@gnu.org (Ludovic Courtès) writes:

> However, this relies on eval-after-read semantics.  That is, if the
> whole file is read at once, *then* evaluated, that won't work, right?

Or read at once, *then* compiled, *then* evaluated; or even, read one
expression at a time, compiled one at a time, but evaluated all of a
piece.

A
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-26 21:24   ` Andy Wingo
@ 2009-05-26 22:26     ` Mike Gran
  2009-05-26 22:46       ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Gran @ 2009-05-26 22:26 UTC (permalink / raw)
  To: Andy Wingo, Ludovic Courtès; +Cc: guile-devel

> From: Andy Wingo <wingo@pobox.com>

> 
> On Tue 26 May 2009 00:22, ludo@gnu.org (Ludovic Courtès) writes:
> 
> > However, this relies on eval-after-read semantics.  That is, if the
> > whole file is read at once, *then* evaluated, that won't work, right?
> 
> Or read at once, *then* compiled, *then* evaluated; or even, read one
> expression at a time, compiled one at a time, but evaluated all of a
> piece.

If one can't depend on the order of evaluation, the the source encoding
has to become a pragma that is preprocessed.

The reader could probably preprocess the file looking for where 
the text "coding: XXXXX" appears within a comment in the top dozen
lines of a source code file. Or perhaps a line that is explicitly
";;;; #pragma coding: XXXXX" in the top few lines of a file.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-26 22:26     ` Mike Gran
@ 2009-05-26 22:46       ` Andy Wingo
  2009-05-28 14:37         ` Ludovic Courtès
  2009-05-28 19:57         ` Mike Gran
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Wingo @ 2009-05-26 22:46 UTC (permalink / raw)
  To: Mike Gran; +Cc: Ludovic Courtès, guile-devel

Hi Mike,

On Wed 27 May 2009 00:26, Mike Gran <spk121@yahoo.com> writes:

> If one can't depend on the order of evaluation, the the source encoding
> has to become a pragma that is preprocessed.

One can depend on the order of evaluation, but not of the order of
reading and evaluation.

> The reader could probably preprocess the file looking for where 
> the text "coding: XXXXX" appears within a comment in the top dozen
> lines of a source code file. Or perhaps a line that is explicitly
> ";;;; #pragma coding: XXXXX" in the top few lines of a file.

This sounds almost sane to me. I think python has a standard for this:

   http://www.python.org/dev/peps/pep-0263/

This is complicated in Guile by #!. A reasonable thing would be to have
the reader have a bit on whether it actually saw an expression yet or
not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-26 22:46       ` Andy Wingo
@ 2009-05-28 14:37         ` Ludovic Courtès
  2009-05-28 18:21           ` Andy Wingo
  2009-05-28 19:57         ` Mike Gran
  1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2009-05-28 14:37 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hello,

Andy Wingo <wingo@pobox.com> writes:

> This is complicated in Guile by #!. A reasonable thing would be to have
> the reader have a bit on whether it actually saw an expression yet or
> not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.

I think it would make sense to follow Emacs' specification of file-local
variables as closely as possible (info "(emacs) Specifying File
Variables"), as well as its naming scheme for encodings as shown by
`M-x list-coding-systems'.

Thanks,
Ludo'.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-28 14:37         ` Ludovic Courtès
@ 2009-05-28 18:21           ` Andy Wingo
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Wingo @ 2009-05-28 18:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi,

On Thu 28 May 2009 16:37, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> writes:
>
>> This is complicated in Guile by #!. A reasonable thing would be to have
>> the reader have a bit on whether it actually saw an expression yet or
>> not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.
>
> I think it would make sense to follow Emacs' specification of file-local
> variables as closely as possible (info "(emacs) Specifying File
> Variables"), as well as its naming scheme for encodings as shown by
> `M-x list-coding-systems'.

Good points. Although, I wonder how emacs does the right thing regarding
coding: if the variable list is at the end of a file. But certainly
recognizing it in the first two lines of the file would be robust and
follow emacs.

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-26 22:46       ` Andy Wingo
  2009-05-28 14:37         ` Ludovic Courtès
@ 2009-05-28 19:57         ` Mike Gran
  2009-05-28 21:04           ` Ludovic Courtès
  2009-05-29  9:35           ` Andy Wingo
  1 sibling, 2 replies; 10+ messages in thread
From: Mike Gran @ 2009-05-28 19:57 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

> From: Andy Wingo <wingo@pobox.com>
> 
> Hi Mike,
> 
> > The reader could probably preprocess the file looking for where 
> > the text "coding: XXXXX" appears within a comment in the top dozen
> > lines of a source code file. Or perhaps a line that is explicitly
> > ";;;; #pragma coding: XXXXX" in the top few lines of a file.
> 
> This sounds almost sane to me. I think python has a standard for this:
> 
>   http://www.python.org/dev/peps/pep-0263/
> 
> This is complicated in Guile by #!. A reasonable thing would be to have
> the reader have a bit on whether it actually saw an expression yet or
> not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.

Works for me.  I'll do that. 

Also, just for the record, it seems obvious that this character 
encoding pragma should only work on files, which is fine.  I think
that is the way it would work.  Once could imagine a use where
someone loaded code into a string and then passed it to scm_read()
for interpretation.  In this case, I think "coding: XXXX" or
whatever should not be interpreted.

scm_read() can't handle this on its own because it has no "state".
It is called once per expression.

This all means that grepping the coding is a true preprocessing
step, divorced from the reader.

--

While we're on the topic, here's some serious pedantry about it all.
Fascinating to me, of course.  Less so to others, I'm sure.  Feel
free to zone out...

I went back and forth on the idea as to whether each port should have
its own dedicated character encoding, or if it was okay to have a
single encoding for all ports in a thread.  I've been going with the
single-encoding plan because R6RS I/O ports have a strong API for
that, while legacy Guile port API does not consider it.  I've been
trying not to modify Guile API.

For backwards compatibility, if no locale or encoding is set, Guile
ports should still function exactly as before.  I don't want to break
anything.

The medium-term plan it that if a program wants to read/write data that
is not in its locale encoding, it should prefer R6RS ports.  If it 
wants to read/write data in its current locale and encoding, Guile 
ports or R6RS ports should handle that transparently.

The procedure scm_read is firm API and takes a port, which means 
that the s-expression it reads will be interpreted in the context of
the port's encoding.  It is the default reader.

But, if the reader is modified to take its character encoding from
the top of the file, then the reader can't use scm_read directly 
as it would use the port's encoding.

It isn't as simple as pushing the old encoding, interpreting 
under the file's encoding, and then popping the old encoding, because
the output to stdout and stderr would then appear in the file's
encoding and not the terminal's locale's encoding.

So it neads a new reader, scm_read_with_encoding() or some such.

-Mike Gran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-28 19:57         ` Mike Gran
@ 2009-05-28 21:04           ` Ludovic Courtès
  2009-05-29  9:35           ` Andy Wingo
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2009-05-28 21:04 UTC (permalink / raw)
  To: guile-devel

Hello,

Mike Gran <spk121@yahoo.com> writes:

> This all means that grepping the coding is a true preprocessing
> step, divorced from the reader.

Not necessarily.  The encoding can be stored in a fluid, or associated
with the current input port, and modified by `scm_read ()' as it
encounters encoding meta-data.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475
  2009-05-28 19:57         ` Mike Gran
  2009-05-28 21:04           ` Ludovic Courtès
@ 2009-05-29  9:35           ` Andy Wingo
  1 sibling, 0 replies; 10+ messages in thread
From: Andy Wingo @ 2009-05-29  9:35 UTC (permalink / raw)
  To: Mike Gran; +Cc: Ludovic Courtès, guile-devel

Hey Mike,

On Thu 28 May 2009 21:57, Mike Gran <spk121@yahoo.com> writes:

> Also, just for the record, it seems obvious that this character 
> encoding pragma should only work on files, which is fine.  I think
> that is the way it would work.  Once could imagine a use where
> someone loaded code into a string and then passed it to scm_read()
> for interpretation.  In this case, I think "coding: XXXX" or
> whatever should not be interpreted.

Hmm, dunno. I feel like many of Guile's users might be doing this. OTOH
they don't have `coding' support. I guess I can see your point here.

> scm_read() can't handle this on its own because it has no "state".
> It is called once per expression.

If scm_read() looks for the coding as a property of a port, I can
imagine it mutating that value too.

> The procedure scm_read is firm API and takes a port, which means 
> that the s-expression it reads will be interpreted in the context of
> the port's encoding.  It is the default reader.
>
> But, if the reader is modified to take its character encoding from
> the top of the file, then the reader can't use scm_read directly 
> as it would use the port's encoding.

Why not allow scm_read() to detect this, and modify the port's encoding?

Apologies if I missed the explanation :)

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-05-29  9:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1M8eXe-00072c-07@cvs.savannah.gnu.org>
2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès
2009-05-25 23:14   ` Mike Gran
2009-05-26 21:24   ` Andy Wingo
2009-05-26 22:26     ` Mike Gran
2009-05-26 22:46       ` Andy Wingo
2009-05-28 14:37         ` Ludovic Courtès
2009-05-28 18:21           ` Andy Wingo
2009-05-28 19:57         ` Mike Gran
2009-05-28 21:04           ` Ludovic Courtès
2009-05-29  9:35           ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).