* Needed: per-port reader options @ 2012-10-13 18:51 Mark H Weaver 2012-10-16 15:52 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2012-10-13 18:51 UTC (permalink / raw) To: guile-devel I recently tried to implement reader directives, e.g.: #!fold-case (from R7RS) #!no-fold-case (from R7RS) #!curly-infix (from SRFI-105) However, I ran into a rather serious problem. Guile's reader options are global to the entire process, but clearly these reader directives should affect only the port they are read from. So we need to support per-port reader options. Unfortunately, our entire 'scm_t_port' structure is apparently part of our public API and thus cannot be changed in 2.0.x. Therefore, if we hope to support either R7RS or SRFI-105 in 2.0.x, then I guess the only option is to make a global weak key hash table, mapping ports to a table of reader option overrides (and perhaps anything else we want to add to the 'scm_t_port' structure). Of course this global hash table must be protected by a mutex, which will mean additional overhead for every 'read'. I hope that we can make 'scm_t_port' private in Guile 2.2, and thus restore our freedom to modify its structure. Should we add deprecation warnings in Guile 2.0 for applications that make direct use of this structure? Any ideas how to accomplish that? What do you think? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-13 18:51 Needed: per-port reader options Mark H Weaver @ 2012-10-16 15:52 ` Ludovic Courtès 2012-10-16 20:22 ` Mark H Weaver 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2012-10-16 15:52 UTC (permalink / raw) To: guile-devel Hi, Mark H Weaver <mhw@netris.org> skribis: > I recently tried to implement reader directives, e.g.: > > #!fold-case (from R7RS) > #!no-fold-case (from R7RS) > #!curly-infix (from SRFI-105) > > However, I ran into a rather serious problem. Guile's reader options > are global to the entire process, but clearly these reader directives > should affect only the port they are read from. So we need to support > per-port reader options. I think we should instead be able to instantiate new readers–i.e., have a higher-order ‘make-reader’ procedure (that’s how Guile-Reader started, actually.) That, combined with the ‘current-reader’ fluid, should provide good flexibility. As an example, http://git.savannah.gnu.org/cgit/guile-reader.git/tree/modules/system/reader/confinement.scm provides reader settings that are confined for each module. Also, I think ports should remain reader-oblivious. ‘read’ is fundamentally at a higher level of abstraction, so I’d personally prefer not to have ports fiddle with it. WDYT? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-16 15:52 ` Ludovic Courtès @ 2012-10-16 20:22 ` Mark H Weaver 2012-10-16 21:39 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2012-10-16 20:22 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel ludo@gnu.org (Ludovic Courtès) writes: > Mark H Weaver <mhw@netris.org> skribis: > >> I recently tried to implement reader directives, e.g.: >> >> #!fold-case (from R7RS) >> #!no-fold-case (from R7RS) >> #!curly-infix (from SRFI-105) >> >> However, I ran into a rather serious problem. Guile's reader options >> are global to the entire process, but clearly these reader directives >> should affect only the port they are read from. So we need to support >> per-port reader options. > > I think we should instead be able to instantiate new readers–i.e., have > a higher-order ‘make-reader’ procedure (that’s how Guile-Reader started, > actually.) > > That, combined with the ‘current-reader’ fluid, should provide good > flexibility. Being able to easily create new readers sounds great to me, but that's not sufficient to implement the reader directives above, the first two of which are mandated by both R6RS and R7RS (draft). Those standards mandate that we be able to change the case-sensitivity flag on a per-port basis, while in the *middle* of reading a datum, anywhere that comments are permitted. Suppose we ignore those standards requirements. If someone wants to distribute a portable module with case-insensitive symbols or curly-infix syntax, how would you recommend that they portably arrange for their file to be read by a special reader? Even if you ignore these problems (which are bad enough), there's also the issue that one must reimplement the entire reader in order to add one new feature. This means reimplementing things like Guile's array syntax (which is quite hairy), somehow hooking into 'read-hash-extend', and keeping all of these reader implementations in sync with each other. SRFI-105 is actually quite simple to implement, and in my latest patch it required modification in only a few places. > Also, I think ports should remain reader-oblivious. ‘read’ is > fundamentally at a higher level of abstraction, so I’d personally prefer > not to have ports fiddle with it. I agree. In my patch, ports don't know anything about read. I merely made the values of 'scm_i_port_weak_hash' into alists. Previously their values were unused, and always #f. In Guile 2.2, I'd like to move this alist into the port structure itself for better efficiency. The built-in reader just adds an item to this alist if per-port read options are set. What do you think? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-16 20:22 ` Mark H Weaver @ 2012-10-16 21:39 ` Ludovic Courtès 2012-10-16 21:57 ` Noah Lavine 2012-10-17 3:34 ` Mark H Weaver 0 siblings, 2 replies; 11+ messages in thread From: Ludovic Courtès @ 2012-10-16 21:39 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel Hi Mark, Mark H Weaver <mhw@netris.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: >> Mark H Weaver <mhw@netris.org> skribis: >> >>> I recently tried to implement reader directives, e.g.: >>> >>> #!fold-case (from R7RS) >>> #!no-fold-case (from R7RS) >>> #!curly-infix (from SRFI-105) >>> >>> However, I ran into a rather serious problem. Guile's reader options >>> are global to the entire process, but clearly these reader directives >>> should affect only the port they are read from. So we need to support >>> per-port reader options. >> >> I think we should instead be able to instantiate new readers–i.e., have >> a higher-order ‘make-reader’ procedure (that’s how Guile-Reader started, >> actually.) >> >> That, combined with the ‘current-reader’ fluid, should provide good >> flexibility. > > Being able to easily create new readers sounds great to me, but that's > not sufficient to implement the reader directives above, Of course. I just meant that, if you can call ‘make-reader’ with whatever options you’re interested in, instead of globally changing the reader’s option as is currently the case, then you’re halfway through. And in that case, the reader doesn’t need to be associated with the port. Instead, since ‘primitive-load’ honors ‘current-reader’, it just needs to be set when loading a file. This way, any changes to the reader’s options will be local to that file. Yet, this avoids the need to explicitly have a port-to-reader mapping. (That’s how I’ve used Guile-Reader in Skribilo, FWIW.) Concretely, this would mean changing read.c such that each token reader takes the reader options as an additional first parameter. Instead of looking up the global ‘scm_read_opts’, they would look at this explicit parameter. How does that sound? > the first two of which are mandated by both R6RS and R7RS (draft). > > Those standards mandate that we be able to change the case-sensitivity > flag on a per-port basis, while in the *middle* of reading a datum, > anywhere that comments are permitted. > > Suppose we ignore those standards requirements. If someone wants to > distribute a portable module with case-insensitive symbols or > curly-infix syntax, how would you recommend that they portably arrange > for their file to be read by a special reader? > > Even if you ignore these problems (which are bad enough), there's also > the issue that one must reimplement the entire reader in order to add > one new feature. This means reimplementing things like Guile's array > syntax (which is quite hairy), somehow hooking into 'read-hash-extend', > and keeping all of these reader implementations in sync with each other. > > SRFI-105 is actually quite simple to implement, and in my latest patch > it required modification in only a few places. I agree we need these reader extensions, of course. I’m just wondering about how to do it best. >> Also, I think ports should remain reader-oblivious. ‘read’ is >> fundamentally at a higher level of abstraction, so I’d personally prefer >> not to have ports fiddle with it. > > I agree. In my patch, ports don't know anything about read. I merely > made the values of 'scm_i_port_weak_hash' into alists. Previously their > values were unused, and always #f. In Guile 2.2, I'd like to move this > alist into the port structure itself for better efficiency. The > built-in reader just adds an item to this alist if per-port read options > are set. > > What do you think? I think ‘current-reader’ should remove the need to have a port-to-reader mapping, no? Thanks for looking into this! Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-16 21:39 ` Ludovic Courtès @ 2012-10-16 21:57 ` Noah Lavine 2012-10-16 22:01 ` Ludovic Courtès 2012-10-17 3:34 ` Mark H Weaver 1 sibling, 1 reply; 11+ messages in thread From: Noah Lavine @ 2012-10-16 21:57 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Mark H Weaver, guile-devel Hello, On Tue, Oct 16, 2012 at 5:39 PM, Ludovic Courtès <ludo@gnu.org> wrote: > I think ‘current-reader’ should remove the need to have a port-to-reader > mapping, no? > > Thanks for looking into this! > > Ludo’. > I might not understand this correctly, but aren't the reader flags only supposed to affect the specific file they're reading from? That's why we need a different set of reader options for each port. Of course, there could be one global reader that has a port-to-reader-options mapping, but that's a lot like a port-to-reader mapping. I think what you're suggesting is making a new reader for each input file, letting that reader set its flags based on directives, and then abandoning that reader when the file is read. Is that right? I'm just trying to understand the issue, Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-16 21:57 ` Noah Lavine @ 2012-10-16 22:01 ` Ludovic Courtès 0 siblings, 0 replies; 11+ messages in thread From: Ludovic Courtès @ 2012-10-16 22:01 UTC (permalink / raw) To: Noah Lavine; +Cc: Mark H Weaver, guile-devel Hi Noah, Noah Lavine <noah.b.lavine@gmail.com> skribis: > I might not understand this correctly, but aren't the reader flags > only supposed to affect the specific file they're reading from? That's > why we need a different set of reader options for each port. Of > course, there could be one global reader that has a > port-to-reader-options mapping, but that's a lot like a port-to-reader > mapping. > > I think what you're suggesting is making a new reader for each input > file, letting that reader set its flags based on directives, and then > abandoning that reader when the file is read. Is that right? Exactly. That’s what ‘load-in-vicinity’ supports, for instance. Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-16 21:39 ` Ludovic Courtès 2012-10-16 21:57 ` Noah Lavine @ 2012-10-17 3:34 ` Mark H Weaver 2012-10-18 16:20 ` Ludovic Courtès 1 sibling, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2012-10-17 3:34 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hi Ludovic! ludo@gnu.org (Ludovic Courtès) writes: > Mark H Weaver <mhw@netris.org> skribis: >> ludo@gnu.org (Ludovic Courtès) writes: >>> Mark H Weaver <mhw@netris.org> skribis: >>> >>>> I recently tried to implement reader directives, e.g.: >>>> >>>> #!fold-case (from R7RS) >>>> #!no-fold-case (from R7RS) >>>> #!curly-infix (from SRFI-105) >>>> >>>> However, I ran into a rather serious problem. Guile's reader options >>>> are global to the entire process, but clearly these reader directives >>>> should affect only the port they are read from. So we need to support >>>> per-port reader options. >>> >>> I think we should instead be able to instantiate new readers–i.e., have >>> a higher-order ‘make-reader’ procedure (that’s how Guile-Reader started, >>> actually.) >>> >>> That, combined with the ‘current-reader’ fluid, should provide good >>> flexibility. >> >> Being able to easily create new readers sounds great to me, but that's >> not sufficient to implement the reader directives above, > > Of course. I just meant that, if you can call ‘make-reader’ with > whatever options you’re interested in, instead of globally changing the > reader’s option as is currently the case, then you’re halfway through. > > And in that case, the reader doesn’t need to be associated with the > port. Instead, since ‘primitive-load’ honors ‘current-reader’, it just > needs to be set when loading a file. This way, any changes to the > reader’s options will be local to that file. I see a few problems with that. First of all, since the reader directives can be placed anywhere that comments are permitted, the read options must be changed while the reader is in the middle of reading a single datum, e.g.: (#!no-fold-case BLAH ((BLAH #!fold-case BLAH)) BLAH) => (BLAH ((BLAH blah)) blah) This cannot be done by replacing the entire reader, at least not if the reader is written in a straightforward way. The reader options must be changed during a single call to the reader, and then must be honored for all subsequent tokens until overridden by a future directive. Another problem is that these reader directives have nothing to do with evaluation or loading of code. They affect the reader itself, and therefore they should work as expected no matter what is doing the reading. In other words, if a program uses 'read' on a data file, the reader directives '#!fold-case' et al should affect all future calls to 'read' on that file. Fluids cannot solve this problem, because the program might be performing interleaved reads of multiple files within the same thread. > Concretely, this would mean changing read.c such that each token reader > takes the reader options as an additional first parameter. Instead of > looking up the global ‘scm_read_opts’, they would look at this explicit > parameter. This is almost exactly what my patch does. I added an explicit parameter of type 'scm_t_read_opts *' to most of the helper functions in read.c, and that parameter is consulted instead of the global options. When reader directives such as '#!fold-case' are encountered, both the 'scm_t_read_opts' struct and the per-port options are mutated. 'scm_read' initializes a local 'scm_t_read_opts' struct based on both the global read options and the per-port overrides (if any), and a pointer to that struct is passed down to all the helper functions in read.c that need it. What do you think? Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-17 3:34 ` Mark H Weaver @ 2012-10-18 16:20 ` Ludovic Courtès 2012-10-18 17:48 ` Mark H Weaver 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2012-10-18 16:20 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel Hi Mark! Mark H Weaver <mhw@netris.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: [...] >> Of course. I just meant that, if you can call ‘make-reader’ with >> whatever options you’re interested in, instead of globally changing the >> reader’s option as is currently the case, then you’re halfway through. >> >> And in that case, the reader doesn’t need to be associated with the >> port. Instead, since ‘primitive-load’ honors ‘current-reader’, it just >> needs to be set when loading a file. This way, any changes to the >> reader’s options will be local to that file. > > I see a few problems with that. > > First of all, since the reader directives can be placed anywhere that > comments are permitted, the read options must be changed while the > reader is in the middle of reading a single datum, Yes, but the reader can modify its own options data structure. [...] > In other words, if a program uses 'read' on a data file, the > reader directives '#!fold-case' et al should affect all future calls to > 'read' on that file. Just on that file, or on any file subsequently loaded? > Fluids cannot solve this problem, because the program might be > performing interleaved reads of multiple files within the same thread. SRFI-105 reads: An implementation of this SRFI MUST accept the marker #!curly-infix followed by a whitespace character in its standard datum readers [...] After reading this marker, the reader MUST accept curly-infix expressions in subsequent datums until it reaches an end-of-file [...] To me, this sounds like global reader options (reset on EOF), not like per-port options. >> Concretely, this would mean changing read.c such that each token reader >> takes the reader options as an additional first parameter. Instead of >> looking up the global ‘scm_read_opts’, they would look at this explicit >> parameter. > > This is almost exactly what my patch does. I added an explicit > parameter of type 'scm_t_read_opts *' to most of the helper functions in > read.c, and that parameter is consulted instead of the global options. I like it. > When reader directives such as '#!fold-case' are encountered, both the > 'scm_t_read_opts' struct and the per-port options are mutated. > > 'scm_read' initializes a local 'scm_t_read_opts' struct based on both > the global read options and the per-port overrides (if any), and a > pointer to that struct is passed down to all the helper functions in > read.c that need it. > > What do you think? The patch you posted (“Implement per-port reader options, #!fold-case and #!no-fold-case.”) does all three things at once: (1) explicit instead of global reader options, (2) per-port reader options, and (3) fold-case. Do you think you could split it into 3 patches? I’m happy with (1) and (3). I remain skeptical about (2), because of the mixture of concerns. Sorry for the extra work, but thank you for pushing these things! Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-18 16:20 ` Ludovic Courtès @ 2012-10-18 17:48 ` Mark H Weaver 2012-10-19 2:12 ` Mark H Weaver 0 siblings, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2012-10-18 17:48 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel ludo@gnu.org (Ludovic Courtès) writes: > Hi Mark! > > Mark H Weaver <mhw@netris.org> skribis: > >> ludo@gnu.org (Ludovic Courtès) writes: > > [...] > >>> Of course. I just meant that, if you can call ‘make-reader’ with >>> whatever options you’re interested in, instead of globally changing the >>> reader’s option as is currently the case, then you’re halfway through. >>> >>> And in that case, the reader doesn’t need to be associated with the >>> port. Instead, since ‘primitive-load’ honors ‘current-reader’, it just >>> needs to be set when loading a file. This way, any changes to the >>> reader’s options will be local to that file. >> >> I see a few problems with that. >> >> First of all, since the reader directives can be placed anywhere that >> comments are permitted, the read options must be changed while the >> reader is in the middle of reading a single datum, > > Yes, but the reader can modify its own options data structure. True, but it cannot arrange for subsequent calls to 'read' on that port to use the new options data structure without stashing those changed options somewhere. Also, modifying its own options data structure cannot work for a reader directive like #!curly-infix unless curly-infix is supported by the default reader. >> In other words, if a program uses 'read' on a data file, the >> reader directives '#!fold-case' et al should affect all future calls to >> 'read' on that file. > > Just on that file, or on any file subsequently loaded? Just on that file. That's the only sane thing. Think about it. Suppose you're reading multiple files in an interleaved fashion (perhaps via a lazy stream that reads the files on demand), and a #!fold-case in one file changed the way the other files were read. That would be totally broken, don't you agree? FWIW, I tested the behavior of Chibi Scheme, and it does the right thing, exactly as I have described and implemented. >> Fluids cannot solve this problem, because the program might be >> performing interleaved reads of multiple files within the same thread. > > SRFI-105 reads: > > An implementation of this SRFI MUST accept the marker #!curly-infix > followed by a whitespace character in its standard datum readers [...] > > After reading this marker, the reader MUST accept curly-infix > expressions in subsequent datums until it reaches an end-of-file [...] > > To me, this sounds like global reader options (reset on EOF), not like > per-port options. Really? "until it reaches an end-of-file" sounds like per-port to me. Not convinced? Imagine the same thought experiment I proposed above, with one thread reading multiple files in an interleaved way via lazy streams. With your proposal, not only does the appearance of #!fold-case mysteriously change the way the other files are read at some random point depending on the interleaving, but now, when the original file's is EOF is found, the other files mysteriously change back to case-sensitive mode! Can you find _any_ scheme implementation that handles reader directives that way? If so, please let me know, because I have a strongly worded bug report to file :) >>> Concretely, this would mean changing read.c such that each token reader >>> takes the reader options as an additional first parameter. Instead of >>> looking up the global ‘scm_read_opts’, they would look at this explicit >>> parameter. >> >> This is almost exactly what my patch does. I added an explicit >> parameter of type 'scm_t_read_opts *' to most of the helper functions in >> read.c, and that parameter is consulted instead of the global options. > > I like it. Excellent! >> When reader directives such as '#!fold-case' are encountered, both the >> 'scm_t_read_opts' struct and the per-port options are mutated. >> >> 'scm_read' initializes a local 'scm_t_read_opts' struct based on both >> the global read options and the per-port overrides (if any), and a >> pointer to that struct is passed down to all the helper functions in >> read.c that need it. >> >> What do you think? > > The patch you posted (“Implement per-port reader options, #!fold-case > and #!no-fold-case.”) does all three things at once: (1) explicit > instead of global reader options, (2) per-port reader options, and (3) > fold-case. > > Do you think you could split it into 3 patches? Fair enough, I agree that this makes sense. I'll work on it. > I’m happy with (1) and (3). I remain skeptical about (2), because of > the mixture of concerns. I don't think it's really a mixture of concerns. The port just provides an alist for anyone to use, without caring what is put there. 'read' needs to know about ports, but that's always been the case. > Sorry for the extra work, but thank you for pushing these things! No problem, thanks for discussing it :) Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-18 17:48 ` Mark H Weaver @ 2012-10-19 2:12 ` Mark H Weaver 2012-10-21 21:40 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Mark H Weaver @ 2012-10-19 2:12 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Section 2.1 of the R7RS (draft 6) explicitly says "The #!fold-case directive causes the read procedure to case-fold [...] each identifier and character name subsequently read from the same port." Here's the relevant excerpt: The following directives give explicit control over case folding. #!fold-case #!no-fold-case These directives may appear anywhere comments are permitted (see section 2.2) and are treated as comments, except that they affect the reading of subsequent data. The #!fold-case directive causes the read procedure to case-fold (as if by string-foldcase; see section 6.7) each identifier and character name subsequently read from the same port. (It has no effect on character literals.) The #!no-fold-case directive causes the read procedure to return to the default, non-folding behavior. Regards, Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Needed: per-port reader options 2012-10-19 2:12 ` Mark H Weaver @ 2012-10-21 21:40 ` Ludovic Courtès 0 siblings, 0 replies; 11+ messages in thread From: Ludovic Courtès @ 2012-10-21 21:40 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel Hi, Mark H Weaver <mhw@netris.org> skribis: > Section 2.1 of the R7RS (draft 6) explicitly says "The #!fold-case > directive causes the read procedure to case-fold [...] each identifier > and character name subsequently read from the same port." OK, this is more precise than SRFI-105, and definitely per-port (the semantics are not quite to my taste, but well...) Then the weak hash table mapping ports to reader options seems like the “right” approach, to implement these semantics. Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-10-21 21:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-13 18:51 Needed: per-port reader options Mark H Weaver 2012-10-16 15:52 ` Ludovic Courtès 2012-10-16 20:22 ` Mark H Weaver 2012-10-16 21:39 ` Ludovic Courtès 2012-10-16 21:57 ` Noah Lavine 2012-10-16 22:01 ` Ludovic Courtès 2012-10-17 3:34 ` Mark H Weaver 2012-10-18 16:20 ` Ludovic Courtès 2012-10-18 17:48 ` Mark H Weaver 2012-10-19 2:12 ` Mark H Weaver 2012-10-21 21:40 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).