* Re: GNU Guile PEG-parser [not found] <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com> @ 2012-02-08 0:47 ` Noah Lavine 2012-02-08 18:29 ` Krister Svanlund 0 siblings, 1 reply; 3+ messages in thread From: Noah Lavine @ 2012-02-08 0:47 UTC (permalink / raw) To: Krister Svanlund; +Cc: guile-devel Hello, Thanks for emailing! I suppose I am the one to talk to, since I was the last one to work on it. I didn't make the PEG parsing syntax, but I would guess the reason there isn't a string syntax for ignore is that there's no conventional way to write it, but there is for the other PEG elements. It would be easy to add one if it was useful, but we'd want to make sure our syntax agreed with other PEG libraries, so people wouldn't be confused later. For blank-space indented blocks, do you mean you want to group together lines with the same indentation, like Python syntax? If you know what the indentation will be at the beginning of each line, you can do something like this: (* (and "\t" <match-line> "\n")), where you replace "\t" with whatever indentation you want. However, what you probably want to do is look at the indentation in the first line and then group it with every following line that has the same indentation. I'm not sure if it's possible, but it would probably be ugly. If you tell me what you're trying to do, though, I can help you write your own parser to handle it. You can even write some of your parser yourself and use PEGs for the rest, if you're willing to use PEG internals. Can you tell me more about what you're trying to do? I am happy to help now, but I will be more helpful if I know more. I'm going to CC the guile-devel mailing list because of the issue with the string syntax. Noah On Tue, Feb 7, 2012 at 10:03 AM, Krister Svanlund <krister.svanlund@gmail.com> wrote: > Hi, > I'm currently involved in a project that plans on using the PEG module for > Guile for parsing and I've understod that you are the one to talk to about > it. I'm mostly just curious how come there isn't an equivalent to ignore in > string-patterns and if this would be complex to add? > > I'm also curious if there is any way to deal with blank-space indented > blocks in PEG. > > Yours > Krister Svanlund ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GNU Guile PEG-parser 2012-02-08 0:47 ` GNU Guile PEG-parser Noah Lavine @ 2012-02-08 18:29 ` Krister Svanlund 2012-02-09 15:42 ` Noah Lavine 0 siblings, 1 reply; 3+ messages in thread From: Krister Svanlund @ 2012-02-08 18:29 UTC (permalink / raw) To: Noah Lavine; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 3094 bytes --] Hi, thanks for a quick response! I've actually found no PEG library that has a string syntax for the equivalent of ignore. I'm guessing most people are satisfied with just specifying another nonterminal and matching that one. Probably because it is seen as less ugly than extending on the formal definition of PEG but I really think we could get a cleaner PEG definition of our parser if we where able to ignore text that wasn't needed or gets in the way while using string-patterns. It's actually exactly Python I'm thinking about, we are currently doing a preprocessor that will put #{ and #} before and after each block but I was hoping that there exists a cleaner solution using the power of PEG instead of basic string manipulation. If you could help in any way shape or form that would be greatly appreciated, even just suggesting on what parts of PEG internals to look at would be really useful. I hope you or the guile-devel list can be of help. Yors, Krister Svanlund On Wed, Feb 8, 2012 at 1:47 AM, Noah Lavine <noah.b.lavine@gmail.com> wrote: > Hello, > > Thanks for emailing! I suppose I am the one to talk to, since I was > the last one to work on it. > > I didn't make the PEG parsing syntax, but I would guess the reason > there isn't a string syntax for ignore is that there's no conventional > way to write it, but there is for the other PEG elements. It would be > easy to add one if it was useful, but we'd want to make sure our > syntax agreed with other PEG libraries, so people wouldn't be confused > later. > > For blank-space indented blocks, do you mean you want to group > together lines with the same indentation, like Python syntax? If you > know what the indentation will be at the beginning of each line, you > can do something like this: > > (* (and "\t" <match-line> "\n")), > > where you replace "\t" with whatever indentation you want. > > However, what you probably want to do is look at the indentation in > the first line and then group it with every following line that has > the same indentation. I'm not sure if it's possible, but it would > probably be ugly. If you tell me what you're trying to do, though, I > can help you write your own parser to handle it. You can even write > some of your parser yourself and use PEGs for the rest, if you're > willing to use PEG internals. > > Can you tell me more about what you're trying to do? I am happy to > help now, but I will be more helpful if I know more. > > I'm going to CC the guile-devel mailing list because of the issue with > the string syntax. > > Noah > > On Tue, Feb 7, 2012 at 10:03 AM, Krister Svanlund > <krister.svanlund@gmail.com> wrote: > > Hi, > > I'm currently involved in a project that plans on using the PEG module > for > > Guile for parsing and I've understod that you are the one to talk to > about > > it. I'm mostly just curious how come there isn't an equivalent to ignore > in > > string-patterns and if this would be complex to add? > > > > I'm also curious if there is any way to deal with blank-space indented > > blocks in PEG. > > > > Yours > > Krister Svanlund > [-- Attachment #2: Type: text/html, Size: 3908 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GNU Guile PEG-parser 2012-02-08 18:29 ` Krister Svanlund @ 2012-02-09 15:42 ` Noah Lavine 0 siblings, 0 replies; 3+ messages in thread From: Noah Lavine @ 2012-02-09 15:42 UTC (permalink / raw) To: Krister Svanlund; +Cc: guile-devel Hello, > I've actually found no PEG library that has a string syntax for the > equivalent of ignore. I'm guessing most people are satisfied with just > specifying another nonterminal and matching that one. Probably because it is > seen as less ugly than extending on the formal definition of PEG but I > really think we could get a cleaner PEG definition of our parser if we where > able to ignore text that wasn't needed or gets in the way while using > string-patterns. That makes sense. I'm a bit surprised that you find string patterns easier than S-expression patterns, because I find it the other way around, but different people like different things. I think we could add some string syntax for ignore if you wanted it, although other people on the list should chime in. > It's actually exactly Python I'm thinking about, we are currently doing a > preprocessor that will put #{ and #} before and after each block but I was > hoping that there exists a cleaner solution using the power of PEG instead > of basic string manipulation. If you could help in any way shape or form > that would be greatly appreciated, even just suggesting on what parts of PEG > internals to look at would be really useful. After thinking about it more, you have two choices. The easiest thing would be to parse each line (or extended line, if it ends with "\") with a PEG parser, and use your own logic for the blocks. Your parser would have two steps for each line: 1. Get the indent from the beginning of a line 2. Parse the rest of the line with a PEG parser Then you would take the lines returned by the PEG parser and combine them into a data structure yourself, using the Python line-combining rules. This is probably your best choice. Your second choice depends on the fact that PEG parsers are just functions that take certain arguments and return certain arguments. You can write a function like that yourself and use it just like a PEG nonterminal in your grammar. When I was working on PEG, I actually thought that it would be nice to make this interface public so that different parser generators could interoperate, but I never did it. It's all documented in the PEG Internals section of the manual, though. However, I'd recommend against this just because I think the interface is not as good as it should be right now, so I'd probably want to change it in the future, which would make your code stop working. (Although if this is a one-time thing, then you don't need to care about that.) I suppose you also have a third choice, which is to change the internal interface yourself, then let us make it public, then use it that way. That's the most elegant solution, but it's more work for you. I wouldn't recommend it unless the first option is hard and you want this to last for a long time. I hope this helps, Noah ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-02-09 15:42 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com> 2012-02-08 0:47 ` GNU Guile PEG-parser Noah Lavine 2012-02-08 18:29 ` Krister Svanlund 2012-02-09 15:42 ` Noah Lavine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).