Fwd: my current project: new lalr module

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* Fwd: my current project: new lalr module
       [not found] <E45D6E7C-3864-4CEB-91DE-2560CA145BF1@verizon.net>
@ 2015-06-04  3:40 ` Matt Wette
  2015-06-04  8:41   ` Nala Ginrut
  2015-06-26 20:43   ` Matt Wette
  0 siblings, 2 replies; 6+ messages in thread
From: Matt Wette @ 2015-06-04  3:40 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

(Accidentally sent to guile-devel.)

Having fun with all the great code, and documentation, in guile 2. 

My current project is a lalr module. Last December I wanted to start working on an interpreter for modelica in guile.  Guile has a lalr parser generator (system base lalr).   When I started looking at it, I was not crazy about the input language. I looked to see if a better interface could be pasted on, but the code impenetrable for me.  So I started coding up my own LALR parser generator.  I was hoping it would take a month of nights/weekends but ended up taking me close to four months.  For now I am calling my implementation "lalr1".

A major difference between my lalr1 and lalr in guile is that the guile version is a translation of the C-coded bison tool and mine is a from-scratch implementation in Scheme based on algorithms from the Dragon Book.  (The bison implementation uses the DeRemer-Pennello algorithms, yacc does not.  I'd guess the bison-based implementation will generate a parser generator faster, but I think that is a minor advantage.)

Some features of lalr1:
0) It's implemented with syntax-rules and syntax-case (versus define-macro for lalr).

1) There is no specification of terminals.  Terminals are detected in rules (via fenders) as quoted symbols or characters.
    Example of terminals:  'number, #\+

2) Actions are specified as ($$ action ...).  (Identifiers beginning with $ are reserved.) 

3)  Mid-rule actions (Bison Manual section 3.4.8 ) are supported.  (Not supported by lalr in guile.)

4) Some handy "proxy" macros are included.  The macro expression results in a generated symbol and additional productions.  Examples:
    ($? a b c) will generate symbol $Pk and production ($Pk () (a b c)) 
    ($* a b c) will generate symbol $Pk and production ($Pk () ($Pk a b c)) 
    ($+ a b c) will generate symbol $Pk and production ($Pk (a b c) ($Pk a b c))
   where $Pk will be $P0, $P1, ...

5) It can generate Bison and guile lalr input languages from lalr1 spec's.  (I have been using the Bison converter to debug lookahead generation.)

6) There is a macro to generate lalr1 input language from a lalr-parser specification.

7) Table compaction is optional.  (The current compaction procedure will generate a default action if max duplication is more than 3.)

8) The bulk of the code resides in one module of ~ 1500 lines of code and ~ 500 lines of comments.

Here is a partial specification from the C-parser I am working on.  It is being implemented with attributed-grammar semantics so that I can process code with functions from the SXML modules.  For example, I am using foldts to detect typedefs need to feed back to the lexer.)

(define clang-spec
  (lalr-spec
   (start translation-unit-proxy)
   (grammar

    (translation-unit-proxy (translation-unit ($$ (tl->list $1))))

    (translation-unit
     (external-declaration ($$ (make-tl 'trans-unit $1)))
     (translation-unit external-declaration ($$ (tl-app! $1 $2)))
     )

    (external-declaration
     (function-definition)
     (declaration)
     )
   ...
    (declaration
     (declaration-specifier init-declarator-list  ($$ (...)) #\;  ;; <=mid-rule action to capture typedef's
	  ($$ (list 'decl (tl->list $1) (tl->list $2)))))
     (declaration-specifier #\;
          ($$ (list 'decl (tl->list $1))
     )
    ....
    (direct-declarator
     ('ident ($$ (list 'ident $1)))
     (#\( declarator #\) ($$ (list 'scope $2)))
     (direct-declarator #\[ constant-expression #\] ($$ (list 'array $3 $1)))
     (direct-declarator #\[ #\] ($$ (list 'array $3 $1)))
     (direct-declarator #\( parameter-type-list #\) ($$ (list 'function $1 $3)))
     (direct-declarator #\( identifier-list #\) ($$ (list 'function $1 $3)))
     (direct-declarator #\( #\) ($$ (list 'function $1 $3)))
     )
   ....

Matt

[-- Attachment #2: Type: text/html, Size: 7205 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: my current project: new lalr module
  2015-06-04  3:40 ` Fwd: my current project: new lalr module Matt Wette
@ 2015-06-04  8:41   ` Nala Ginrut
  2015-06-06 17:19     ` Matt Wette
  2015-06-26 20:43   ` Matt Wette
  1 sibling, 1 reply; 6+ messages in thread
From: Nala Ginrut @ 2015-06-04  8:41 UTC (permalink / raw)
  To: Matt Wette; +Cc: guile-user

On Wed, 2015-06-03 at 20:40 -0700, Matt Wette wrote:
> (Accidentally sent to guile-devel.)
> 
> Having fun with all the great code, and documentation, in guile 2. 
> 
> My current project is a lalr module. Last December I wanted to start
> working on an interpreter for modelica in guile.  Guile has a lalr
> parser generator (system base lalr).   When I started looking at it, I
> was not crazy about the input language. I looked to see if a better
> interface could be pasted on, but the code impenetrable for me.  So I
> started coding up my own LALR parser generator.  I was hoping it would
> take a month of nights/weekends but ended up taking me close to four
> months.  For now I am calling my implementation "lalr1".

cool! it'd be a great tool for our multi-lang plan. ;-)
> 
> A major difference between my lalr1 and lalr in guile is that the
> guile
> version is a translation of the C-coded bison tool and mine is a
> from-scratch implementation in Scheme based on algorithms from the
> Dragon Book.  (The bison implementation uses the DeRemer-Pennello
> algorithms, yacc does not.  I'd guess the bison-based implementation
> will generate a parser generator faster, but I think that is a minor
> advantage.)
> 
> Some features of lalr1:
> 0) It's implemented with syntax-rules and syntax-case (versus
> define-macro for lalr).
> 
oh~nice

> 1) There is no specification of terminals.  Terminals are detected in
> rules (via fenders) as quoted symbols or characters.
>     Example of terminals:  'number, #\+

hmm...and strings? Do you have plan to let users choose whether to use
terminals-inference? ;-)
> 

> 6) There is a macro to generate lalr1 input language from a
> lalr-parser specification.
> 
Yes, that would be great!

Happy hacking!

> 
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: my current project: new lalr module
  2015-06-04  8:41   ` Nala Ginrut
@ 2015-06-06 17:19     ` Matt Wette
  2015-06-08  3:15       ` Nala Ginrut
  0 siblings, 1 reply; 6+ messages in thread
From: Matt Wette @ 2015-06-06 17:19 UTC (permalink / raw)
  To: guile-user


On Jun 4, 2015, at 1:41 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> 
>> 1) There is no specification of terminals.  Terminals are detected in
>> rules (via fenders) as quoted symbols or characters.
>>    Example of terminals:  'number, #\+
> 
> hmm...and strings? Do you have plan to let users choose whether to use
> terminals-inference? ;-)

This I will think about.  It should not be difficult.   (I'm wondering what to do with strings like "(hello)" that I might want to convert to symbols.)

Also todo: provide hash functions that would allow the parser and lexer to work with integers instead of symbols.

Matt





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: my current project: new lalr module
  2015-06-06 17:19     ` Matt Wette
@ 2015-06-08  3:15       ` Nala Ginrut
  0 siblings, 0 replies; 6+ messages in thread
From: Nala Ginrut @ 2015-06-08  3:15 UTC (permalink / raw)
  To: Matt Wette; +Cc: guile-user

On Sat, 2015-06-06 at 10:19 -0700, Matt Wette wrote:
> On Jun 4, 2015, at 1:41 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > 
> >> 1) There is no specification of terminals.  Terminals are detected in
> >> rules (via fenders) as quoted symbols or characters.
> >>    Example of terminals:  'number, #\+
> > 
> > hmm...and strings? Do you have plan to let users choose whether to use
> > terminals-inference? ;-)
> 
> This I will think about.  It should not be difficult.   (I'm wondering what to do with strings like "(hello)" that I might want to convert to symbols.)

IIRC, this is implementation specific, say, #{\x28;hello\x29;}#
You can get it from string->symbol. And #{...}# notation is Guile
specific.
But you don't have to worry about the portability, since you may use
cond-expand.


Best regards.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: my current project: new lalr module
  2015-06-04  3:40 ` Fwd: my current project: new lalr module Matt Wette
  2015-06-04  8:41   ` Nala Ginrut
@ 2015-06-26 20:43   ` Matt Wette
  2015-07-20  0:36     ` Matt Wette
  1 sibling, 1 reply; 6+ messages in thread
From: Matt Wette @ 2015-06-26 20:43 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 2981 bytes --]

On Jun 3, 2015, at 8:40 PM, Matt Wette <matthew.wette@verizon.net> wrote:
> My current project is a lalr module. 

Here is a slight update on my parser generator. 
1) I have added a "hashify" procedure that allows one to set up the parser and lex'er to use integers instead of symbols.
2) I have updated the specification parser and processor to allow use of strings.  My C-parser is using "++" and "+=" now.
3) I have updated the lexical analyzer tools to allow (I hope) easier construction of lexical analyzers.   The idea is that the user
    generates readers for identifiers, numbers, other character sequences, comments, etc and then throws those, along with a
   match-table provided by the parser-generator, to a procedure to make a lexical analyzer.

Still thinking about a name.  I may call it NYACC: "Not yet another compiler compiler!"

Here is the lexer:
    (lambda ()
      (let iter ((ch (read-char)))
        (cond
         ((eof-object? ch) (mapval (cons '$end ch)))
         ((char-set-contains? c:ws ch) (iter (read-char)))
         ((read-ident ch) => (lambda (s) (or (assq (string->symbol s) symtab)
                                             (cons '$ident s))))
         ((read-num ch) => mapval)
         ((read-chseq ch) => identity)
         ((read-string ch) => mapval)
         ((read-comm ch) => identity)
         ((skip-comm ch) => (iter (read-char)))
         ((assq-ref chrtab ch) => (lambda (t) (cons t ch)))
         (else (cons ch ch)))))))

And here is the parser when using the hashified interface:
       (lambda (lexr)
          (let iter ((state (list 0))           ; state stack
                     (stack (list '$dummy))     ; sval stack
                     (nval #f)     ; prev reduce to non-terminal value
                     (lval (lexr)))        ; lexical value (from lex'er)
            (let* ((tval (if nval (car nval) (car lval))) ; token (syntax value)
                   (sval (if nval (cdr nval) (cdr lval))) ; semantic value
                   (stxl (vector-ref pat-v (car state))) ; state transition list
                   (stx (or (assq-ref stxl tval) ; trans action (e.g. shift 32)
                            (assq-ref stxl 0) ; default action 
                            (cons 'error 0))))    ; else error
              (when pdbug (dmsg (car state) tval stx))
              (cond
                ((positive? stx)
                 (iter (cons stx state) (cons sval stack)
                       #f (if nval lval (lexr))))
                ((negative? stx)
                 (let* ((gx (abs stx)) (gl (vector-ref len-v gx))
                        ($$ (apply (vector-ref act-v gx) stack)))
                   (iter (list-tail state gl)
                         (list-tail stack gl)
                         (cons (vector-ref rto-v gx) $$)
                         lval)))
                ((zero? stx) ;; accept
                 (car stack))))))
Matt

           

[-- Attachment #2: Type: text/html, Size: 8176 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: my current project: new lalr module
  2015-06-26 20:43   ` Matt Wette
@ 2015-07-20  0:36     ` Matt Wette
  0 siblings, 0 replies; 6+ messages in thread
From: Matt Wette @ 2015-07-20  0:36 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]


On Jun 26, 2015, at 1:43 PM, Matt Wette <matthew.wette@verizon.net> wrote:
> On Jun 3, 2015, at 8:40 PM, Matt Wette <matthew.wette@verizon.net> wrote:
>> My current project is a lalr module. 
> 
> Here is a slight update on my parser generator. 
> 1) I have added a "hashify" procedure that allows one to set up the parser and lex'er to use integers instead of symbols.
> 2) I have updated the specification parser and processor to allow use of strings.  My C-parser is using "++" and "+=" now.
> 3) I have updated the lexical analyzer tools to allow (I hope) easier construction of lexical analyzers.   

More updates:
1) I reworked the export to Bison.  I 
2) I started working on user/hacker manual
3) I am working on some parsers: C, Matlab, Modelica, Ecmascript.
    (When I run my exports modelica grammar through bison I get the same S/R and R/R conflicts.  => vote of confidence)
4) I am working on  writing to standalone file (a la bison for C).
5)  I have submitted much of the code to savannah.   (I have not included the in-work parsers.)
Bugs surely remain.

Here is sample output from the writer for generated parser file (in work) for my C parser:

(define len-v
  #(1 1 5 0 1 2 1 2 1 2 1 2 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 
    ...))

(define pat-v
  #(((8 . 1) (9 . 2) (58 . 3) (60 . 4) (61 . 5) (62 . 6) (65 . 7) (66 . 8) (
    ...))

(define act-v
  #(;; state 0
    (lambda ($1 . $rest) $1)
    ...
    ;; state 3
    (lambda ($2 $1 . $rest)
      (let ((decl `(decl ,(tl->list $1) ,(tl->list $2))))
        (for-each add-typename (find-new-typenames decl))
        decl))
    ...
    ;; state 322
    (lambda ($1 . $rest) `(cpp-stmt ,$1))


[-- Attachment #2: Type: text/html, Size: 4274 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-20  0:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E45D6E7C-3864-4CEB-91DE-2560CA145BF1@verizon.net>
2015-06-04  3:40 ` Fwd: my current project: new lalr module Matt Wette
2015-06-04  8:41   ` Nala Ginrut
2015-06-06 17:19     ` Matt Wette
2015-06-08  3:15       ` Nala Ginrut
2015-06-26 20:43   ` Matt Wette
2015-07-20  0:36     ` Matt Wette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).