* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 [not found] <E1M8eXe-00072c-07@cvs.savannah.gnu.org> @ 2009-05-25 22:22 ` Ludovic Courtès 2009-05-25 23:14 ` Mike Gran 2009-05-26 21:24 ` Andy Wingo 0 siblings, 2 replies; 10+ messages in thread From: Ludovic Courtès @ 2009-05-25 22:22 UTC (permalink / raw) To: Michael Gran; +Cc: guile-devel Hello, "Michael Gran" <spk121@yahoo.com> writes: > add tests for encoding/decoding wide strings Nice! Just a bit of cosmetic nitpicking: > * test-suite/tests/encoding_utf8.test: new Please use hyphens instead of underscores in file names, for consistency. > +(setlocale LC_ALL "en_US.utf8") [...] > +(setencoding "ASCII") [...] > +(setencoding "ISO-8859-7") Do these modify the encoding used by the underlying port? If so, I'd rather explicitly use a fluid, as is done for `current-reader'. However, this relies on eval-after-read semantics. That is, if the whole file is read at once, *then* evaluated, that won't work, right? > +(setlocale LC_ALL "es_MX.ISO-8859-1") Not everyone has this locale. ;-) > +(with-test-prefix > + "internal encoding" > + > + (pass-if "ultima" > + (string=? s1 (string-ints #xfa #x6c #x74 #x69 #x6d #x61))) Please indent as is done in other files. Thanks! Ludo'. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès @ 2009-05-25 23:14 ` Mike Gran 2009-05-26 21:24 ` Andy Wingo 1 sibling, 0 replies; 10+ messages in thread From: Mike Gran @ 2009-05-25 23:14 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guile Devel On Tue, 2009-05-26 at 00:22 +0200, Ludovic Courtès wrote: > Hello, > > Just a bit of cosmetic nitpicking: > > > * test-suite/tests/encoding_utf8.test: new > > Please use hyphens instead of underscores in file names, for > consistency. OK > > > +(setlocale LC_ALL "en_US.utf8") > > [...] > > > +(setencoding "ASCII") > > [...] > > > +(setencoding "ISO-8859-7") > > Do these modify the encoding used by the underlying port? If so, I'd > rather explicitly use a fluid, as is done for `current-reader'. For now, I have only one global port encoding. So setlocale, setencoding modify all subsequent port I/O. > > However, this relies on eval-after-read semantics. That is, if the > whole file is read at once, *then* evaluated, that won't work, right? The reader needs to know the encoding of a file by reading the file. The way I have it set up right now source gets evaluated sequentially and the reader changes encoding when setlocale or setencoding is encountered. Kludgy, but, simple. I think Python sets the source encoding using a magic comment. That's another way to go: have the reader scan the comment blocks for a magic comment before trying to evaluate the file. > > > +(setlocale LC_ALL "es_MX.ISO-8859-1") > > Not everyone has this locale. ;-) Viva la raza! > > > +(with-test-prefix > > + "internal encoding" > > + > > + (pass-if "ultima" > > + (string=? s1 (string-ints #xfa #x6c #x74 #x69 #x6d #x61))) > > Please indent as is done in other files. OK > > Thanks! > Ludo'. Thanks, Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès 2009-05-25 23:14 ` Mike Gran @ 2009-05-26 21:24 ` Andy Wingo 2009-05-26 22:26 ` Mike Gran 1 sibling, 1 reply; 10+ messages in thread From: Andy Wingo @ 2009-05-26 21:24 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel On Tue 26 May 2009 00:22, ludo@gnu.org (Ludovic Courtès) writes: > However, this relies on eval-after-read semantics. That is, if the > whole file is read at once, *then* evaluated, that won't work, right? Or read at once, *then* compiled, *then* evaluated; or even, read one expression at a time, compiled one at a time, but evaluated all of a piece. A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-26 21:24 ` Andy Wingo @ 2009-05-26 22:26 ` Mike Gran 2009-05-26 22:46 ` Andy Wingo 0 siblings, 1 reply; 10+ messages in thread From: Mike Gran @ 2009-05-26 22:26 UTC (permalink / raw) To: Andy Wingo, Ludovic Courtès; +Cc: guile-devel > From: Andy Wingo <wingo@pobox.com> > > On Tue 26 May 2009 00:22, ludo@gnu.org (Ludovic Courtès) writes: > > > However, this relies on eval-after-read semantics. That is, if the > > whole file is read at once, *then* evaluated, that won't work, right? > > Or read at once, *then* compiled, *then* evaluated; or even, read one > expression at a time, compiled one at a time, but evaluated all of a > piece. If one can't depend on the order of evaluation, the the source encoding has to become a pragma that is preprocessed. The reader could probably preprocess the file looking for where the text "coding: XXXXX" appears within a comment in the top dozen lines of a source code file. Or perhaps a line that is explicitly ";;;; #pragma coding: XXXXX" in the top few lines of a file. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-26 22:26 ` Mike Gran @ 2009-05-26 22:46 ` Andy Wingo 2009-05-28 14:37 ` Ludovic Courtès 2009-05-28 19:57 ` Mike Gran 0 siblings, 2 replies; 10+ messages in thread From: Andy Wingo @ 2009-05-26 22:46 UTC (permalink / raw) To: Mike Gran; +Cc: Ludovic Courtès, guile-devel Hi Mike, On Wed 27 May 2009 00:26, Mike Gran <spk121@yahoo.com> writes: > If one can't depend on the order of evaluation, the the source encoding > has to become a pragma that is preprocessed. One can depend on the order of evaluation, but not of the order of reading and evaluation. > The reader could probably preprocess the file looking for where > the text "coding: XXXXX" appears within a comment in the top dozen > lines of a source code file. Or perhaps a line that is explicitly > ";;;; #pragma coding: XXXXX" in the top few lines of a file. This sounds almost sane to me. I think python has a standard for this: http://www.python.org/dev/peps/pep-0263/ This is complicated in Guile by #!. A reasonable thing would be to have the reader have a bit on whether it actually saw an expression yet or not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-26 22:46 ` Andy Wingo @ 2009-05-28 14:37 ` Ludovic Courtès 2009-05-28 18:21 ` Andy Wingo 2009-05-28 19:57 ` Mike Gran 1 sibling, 1 reply; 10+ messages in thread From: Ludovic Courtès @ 2009-05-28 14:37 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Hello, Andy Wingo <wingo@pobox.com> writes: > This is complicated in Guile by #!. A reasonable thing would be to have > the reader have a bit on whether it actually saw an expression yet or > not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding. I think it would make sense to follow Emacs' specification of file-local variables as closely as possible (info "(emacs) Specifying File Variables"), as well as its naming scheme for encodings as shown by `M-x list-coding-systems'. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-28 14:37 ` Ludovic Courtès @ 2009-05-28 18:21 ` Andy Wingo 0 siblings, 0 replies; 10+ messages in thread From: Andy Wingo @ 2009-05-28 18:21 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hi, On Thu 28 May 2009 16:37, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> writes: > >> This is complicated in Guile by #!. A reasonable thing would be to have >> the reader have a bit on whether it actually saw an expression yet or >> not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding. > > I think it would make sense to follow Emacs' specification of file-local > variables as closely as possible (info "(emacs) Specifying File > Variables"), as well as its naming scheme for encodings as shown by > `M-x list-coding-systems'. Good points. Although, I wonder how emacs does the right thing regarding coding: if the variable list is at the end of a file. But certainly recognizing it in the first two lines of the file would be robust and follow emacs. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-26 22:46 ` Andy Wingo 2009-05-28 14:37 ` Ludovic Courtès @ 2009-05-28 19:57 ` Mike Gran 2009-05-28 21:04 ` Ludovic Courtès 2009-05-29 9:35 ` Andy Wingo 1 sibling, 2 replies; 10+ messages in thread From: Mike Gran @ 2009-05-28 19:57 UTC (permalink / raw) To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel > From: Andy Wingo <wingo@pobox.com> > > Hi Mike, > > > The reader could probably preprocess the file looking for where > > the text "coding: XXXXX" appears within a comment in the top dozen > > lines of a source code file. Or perhaps a line that is explicitly > > ";;;; #pragma coding: XXXXX" in the top few lines of a file. > > This sounds almost sane to me. I think python has a standard for this: > > http://www.python.org/dev/peps/pep-0263/ > > This is complicated in Guile by #!. A reasonable thing would be to have > the reader have a bit on whether it actually saw an expression yet or > not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding. Works for me. I'll do that. Also, just for the record, it seems obvious that this character encoding pragma should only work on files, which is fine. I think that is the way it would work. Once could imagine a use where someone loaded code into a string and then passed it to scm_read() for interpretation. In this case, I think "coding: XXXX" or whatever should not be interpreted. scm_read() can't handle this on its own because it has no "state". It is called once per expression. This all means that grepping the coding is a true preprocessing step, divorced from the reader. -- While we're on the topic, here's some serious pedantry about it all. Fascinating to me, of course. Less so to others, I'm sure. Feel free to zone out... I went back and forth on the idea as to whether each port should have its own dedicated character encoding, or if it was okay to have a single encoding for all ports in a thread. I've been going with the single-encoding plan because R6RS I/O ports have a strong API for that, while legacy Guile port API does not consider it. I've been trying not to modify Guile API. For backwards compatibility, if no locale or encoding is set, Guile ports should still function exactly as before. I don't want to break anything. The medium-term plan it that if a program wants to read/write data that is not in its locale encoding, it should prefer R6RS ports. If it wants to read/write data in its current locale and encoding, Guile ports or R6RS ports should handle that transparently. The procedure scm_read is firm API and takes a port, which means that the s-expression it reads will be interpreted in the context of the port's encoding. It is the default reader. But, if the reader is modified to take its character encoding from the top of the file, then the reader can't use scm_read directly as it would use the port's encoding. It isn't as simple as pushing the old encoding, interpreting under the file's encoding, and then popping the old encoding, because the output to stdout and stderr would then appear in the file's encoding and not the terminal's locale's encoding. So it neads a new reader, scm_read_with_encoding() or some such. -Mike Gran ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-28 19:57 ` Mike Gran @ 2009-05-28 21:04 ` Ludovic Courtès 2009-05-29 9:35 ` Andy Wingo 1 sibling, 0 replies; 10+ messages in thread From: Ludovic Courtès @ 2009-05-28 21:04 UTC (permalink / raw) To: guile-devel Hello, Mike Gran <spk121@yahoo.com> writes: > This all means that grepping the coding is a true preprocessing > step, divorced from the reader. Not necessarily. The encoding can be stored in a fluid, or associated with the current input port, and modified by `scm_read ()' as it encounters encoding meta-data. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 2009-05-28 19:57 ` Mike Gran 2009-05-28 21:04 ` Ludovic Courtès @ 2009-05-29 9:35 ` Andy Wingo 1 sibling, 0 replies; 10+ messages in thread From: Andy Wingo @ 2009-05-29 9:35 UTC (permalink / raw) To: Mike Gran; +Cc: Ludovic Courtès, guile-devel Hey Mike, On Thu 28 May 2009 21:57, Mike Gran <spk121@yahoo.com> writes: > Also, just for the record, it seems obvious that this character > encoding pragma should only work on files, which is fine. I think > that is the way it would work. Once could imagine a use where > someone loaded code into a string and then passed it to scm_read() > for interpretation. In this case, I think "coding: XXXX" or > whatever should not be interpreted. Hmm, dunno. I feel like many of Guile's users might be doing this. OTOH they don't have `coding' support. I guess I can see your point here. > scm_read() can't handle this on its own because it has no "state". > It is called once per expression. If scm_read() looks for the coding as a property of a port, I can imagine it mutating that value too. > The procedure scm_read is firm API and takes a port, which means > that the s-expression it reads will be interpreted in the context of > the port's encoding. It is the default reader. > > But, if the reader is modified to take its character encoding from > the top of the file, then the reader can't use scm_read directly > as it would use the port's encoding. Why not allow scm_read() to detect this, and modify the port's encoding? Apologies if I missed the explanation :) Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-05-29 9:35 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <E1M8eXe-00072c-07@cvs.savannah.gnu.org> 2009-05-25 22:22 ` [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475 Ludovic Courtès 2009-05-25 23:14 ` Mike Gran 2009-05-26 21:24 ` Andy Wingo 2009-05-26 22:26 ` Mike Gran 2009-05-26 22:46 ` Andy Wingo 2009-05-28 14:37 ` Ludovic Courtès 2009-05-28 18:21 ` Andy Wingo 2009-05-28 19:57 ` Mike Gran 2009-05-28 21:04 ` Ludovic Courtès 2009-05-29 9:35 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).