unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Q on (language <lang> spec)
@ 2015-10-17 19:02 Matt Wette
  2015-10-18 20:00 ` Matt Wette
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Wette @ 2015-10-17 19:02 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

I am playing with the compiler tower and have been digging through the (system base language) module to try to get my hands around writing to the compiler tower.  Here is the define-language signature:

define-language 
                       [#:name] [#:title] [#:reader] [#:printer]
                       [#:parser=#f] [#:compilers='()]
                       [#:decompilers='()] [#:evaluator=#f]
                       [#:joiner=#f] [#:for-humans?=#t]
                       [#:make-default-environment=make-fresh-user-module]

Here are my assumptions.   I’d appreciate corrections if I have missed something.

reader is a procedure that must be provided.  The procedure takes input port and environment and returns a form of the implementers choice.  The text read from the input port is (nominally) in the supported language.

parser is an optional procedure.  If provided, it takes the output form generated by the reader and returns another form in the implementors choice.

compilers is an a-list of (symbol . procedure).  For each symbol the associated procedure takes as input the form produced by the parser or reader (for the case where parser is not provided) and generates the code associated with the symbol.  For example, if no parser is defined, an entry  of `(tree-il . ,compile-tree-il) means the implementer provides a procedure compile-tree-il that takes a form (returned by the reader), an environment form, and an options (a-list?) and generates tree-il.

decompilers is an a-list of (symbol . procedure).   The procedure takes an expression in the symbol-designated form, along with environment and option a-list, and returns something in the implementers intermediate form (output of parser, or of reader in case no parser is specified).

What did I miss or get wrong?   I have not been digging to figure out joiner or evaluator yet.

I have been able to do the following, but not sure I’ve got things laid out correctly yet:
scheme@(guile-user)> ,L javascript
Happy hacking with javascript!  To switch back, type `,L scheme'.
javascript@(guile-user)> var abc = 123
javascript@(guile-user)> ,L scheme
Happy hacking with Scheme!  To switch back, type `,L javascript'.
scheme@(guile-user)> abc
$1 = 123

Matt



[-- Attachment #2: Type: text/html, Size: 4360 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q on (language <lang> spec)
  2015-10-17 19:02 Q on (language <lang> spec) Matt Wette
@ 2015-10-18 20:00 ` Matt Wette
  2015-10-19  3:53   ` Nala Ginrut
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Wette @ 2015-10-18 20:00 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 4039 bytes --]


> On Oct 17, 2015, at 12:02 PM, Matt Wette <matthew.wette@verizon.net> wrote:
> I am playing with the compiler tower and have been digging through the (system base language) module to try to get my hands around writing to the compiler tower.  .

Here is a simple calculator example that I have working with my own intermediate (SXML based) language. 

scheme@(guile-user)> ,L calc
Happy hacking with calc!  To switch back, type `,L scheme'.
calc@(guile-user)> a = (2.5 + 4.5)/(9.3 - 1)
calc@(guile-user)> ,L scheme
Happy hacking with Scheme!  To switch back, type `,L calc'.
scheme@(guile-user)> a
$1 = 0.8433734939759036

The implementation consists of the files spec.scm, parser.scm and compiler.scm which are listed below.

All files are:

;;; Copyright (C) 2015 Matthew R. Wette
;;;
;;; This program is free software: you can redistribute it and/or modify
;;; it under the terms of the GNU General Public License as published by 
;;; the Free Software Foundation, either version 3 of the License, or 
;;; (at your option) any later version.

and to appear at https://savannah.nongnu.org/projects/nyacc <https://savannah.nongnu.org/projects/nyacc>.

spec.scm:
(define-module (language calc spec)
  #:export (calc)
  #:use-module (system base language)
  #:use-module (nyacc lang calc parser)
  #:use-module (nyacc lang calc compiler))

(define (calc-reader port env)
  (let ((iport (current-input-port)))
    (dynamic-wind
        (lambda () (set-current-input-port port))
        (lambda () (calc-parse #:debug #f))
        (lambda () (set-current-input-port iport)))))

(define-language calc
  #:title       "calc"
  #:reader      calc-reader
  #:compilers   `((tree-il . ,calc-sxml->tree-il))
  #:printer     write)


parser.scm:
(define-module (nyacc lang calc parser)
  #:export (calc-parse calc-spec calc-mach)
  #:use-module (nyacc lalr)
  #:use-module (nyacc lex)
  #:use-module (nyacc parse)
  )

(define calc-spec
  (lalr-spec
   (prec< (left "+" "-") (left "*" "/"))
   (start stmt-list-proxy)
   (grammar

    (stmt-list-proxy
     (stmt-list "\n" ($$ (cons 'stmt-list (reverse $1)))))

    (stmt-list
     (stmt ($$ (list $1)))
     (stmt-list ";" stmt ($$ (cons $3 $1))))

    (stmt
     (ident "=" expr ($$ `(assn-stmt ,$1 ,$3)))
     (expr ($$ `(expr-stmt ,$1)))
     ( ($$ '(empty-stmt))))

    (expr
     (expr "+" expr ($$ `(add ,$1 ,$3)))
     (expr "-" expr ($$ `(sub ,$1 ,$3)))
     (expr "*" expr ($$ `(mul ,$1 ,$3)))
     (expr "/" expr ($$ `(div ,$1 ,$3)))
     ('$fixed ($$ `(fixed ,$1)))
     ('$float ($$ `(float ,$1)))
     ("(" expr ")" ($$ $2)))

    (ident ('$ident ($$ `(ident ,$1))))
    )))

(define calc-mach
  (compact-machine
   (hashify-machine
     (make-lalr-machine calc-spec))))

(define calc-parse
  (let ((gen-lexer (make-lexer-generator (assq-ref calc-mach 'mtab)
                                         #:space-chars " \t"))
        (parser (make-lalr-ia-parser calc-mach)))
    (lambda* (#:key (debug #f)) (parser (gen-lexer) #:debug debug))))


compiler.scm:
(define-module (nyacc lang calc compiler)
  #:export (calc-sxml->tree-il)
  #:use-module (sxml match)
  #:use-module (sxml fold)
  ;;#:use-module (system base language)
  #:use-module (language tree-il))

(define (fup tree)
  (sxml-match tree
    ((fixed ,fx) `(const ,(string->number fx)))
    ((float ,fl) `(const ,(string->number fl)))
    ((ident ,id) `(toplevel ,(string->symbol id)))
    ((add ,lt ,rt) `(apply (toplevel +) ,lt ,rt))
    ((sub ,lt ,rt) `(apply (toplevel -) ,lt ,rt))
    ((mul ,lt ,rt) `(apply (toplevel *) ,lt ,rt))
    ((div ,lt ,rt) `(apply (toplevel /) ,lt ,rt))
    ((assn-stmt (toplevel ,lhs) ,rhs) `(define ,lhs ,rhs))
    ((empty-stmt) '(begin))
    ((stmt-list ,items ...) `(begin ,items ...))
    (,otherwise tree)))

(define (calc-sxml->tree-il exp env opts)
  (let* ((tree (foldt fup identity exp))
         (code (parse-tree-il tree)))
    (values code env env)))


[-- Attachment #2: Type: text/html, Size: 14131 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q on (language <lang> spec)
  2015-10-18 20:00 ` Matt Wette
@ 2015-10-19  3:53   ` Nala Ginrut
  2015-10-23  0:20     ` Matt Wette
  0 siblings, 1 reply; 5+ messages in thread
From: Nala Ginrut @ 2015-10-19  3:53 UTC (permalink / raw)
  To: Matt Wette; +Cc: guile-user

Nice work!
For more generic discussion about multi-lang, I could share some
opinions.

Most of the time, we just need to convert our AST/IR to tree-il. But I
saw old guile-lua frontend added something in the lower level. I haven't
learned it deeper.

There're two identical forms of tree-il, s-expr form and record form.
Personally, I like s-expr one, it's simple to use. But record form could
store the src meta info, which is considered better. 

And I'm grad that you write a new lexer generator (before it I only know
silex), it's great! Would you like to make the generated tokens
compatible with scm-lalr? If so, people may rewrite their lexer module
with your lexer generator, and no need to rewrite the parser. I saw the
token name is string rather than symbol, so I guess it's not compatible
with scm-lalr.

Happy hacking!




On Sun, 2015-10-18 at 13:00 -0700, Matt Wette wrote:
> > On Oct 17, 2015, at 12:02 PM, Matt Wette <matthew.wette@verizon.net> wrote:
> > I am playing with the compiler tower and have been digging through the (system base language) module to try to get my hands around writing to the compiler tower.  .
> 
> Here is a simple calculator example that I have working with my own intermediate (SXML based) language. 
> 
> scheme@(guile-user)> ,L calc
> Happy hacking with calc!  To switch back, type `,L scheme'.
> calc@(guile-user)> a = (2.5 + 4.5)/(9.3 - 1)
> calc@(guile-user)> ,L scheme
> Happy hacking with Scheme!  To switch back, type `,L calc'.
> scheme@(guile-user)> a
> $1 = 0.8433734939759036
> 
> The implementation consists of the files spec.scm, parser.scm and compiler.scm which are listed below.
> 
> All files are:
> 
> ;;; Copyright (C) 2015 Matthew R. Wette
> ;;;
> ;;; This program is free software: you can redistribute it and/or modify
> ;;; it under the terms of the GNU General Public License as published by 
> ;;; the Free Software Foundation, either version 3 of the License, or 
> ;;; (at your option) any later version.
> 
> and to appear at https://savannah.nongnu.org/projects/nyacc <https://savannah.nongnu.org/projects/nyacc>.
> 
> spec.scm:
> (define-module (language calc spec)
>   #:export (calc)
>   #:use-module (system base language)
>   #:use-module (nyacc lang calc parser)
>   #:use-module (nyacc lang calc compiler))
> 
> (define (calc-reader port env)
>   (let ((iport (current-input-port)))
>     (dynamic-wind
>         (lambda () (set-current-input-port port))
>         (lambda () (calc-parse #:debug #f))
>         (lambda () (set-current-input-port iport)))))
> 
> (define-language calc
>   #:title       "calc"
>   #:reader      calc-reader
>   #:compilers   `((tree-il . ,calc-sxml->tree-il))
>   #:printer     write)
> 
> 
> parser.scm:
> (define-module (nyacc lang calc parser)
>   #:export (calc-parse calc-spec calc-mach)
>   #:use-module (nyacc lalr)
>   #:use-module (nyacc lex)
>   #:use-module (nyacc parse)
>   )
> 
> (define calc-spec
>   (lalr-spec
>    (prec< (left "+" "-") (left "*" "/"))
>    (start stmt-list-proxy)
>    (grammar
> 
>     (stmt-list-proxy
>      (stmt-list "\n" ($$ (cons 'stmt-list (reverse $1)))))
> 
>     (stmt-list
>      (stmt ($$ (list $1)))
>      (stmt-list ";" stmt ($$ (cons $3 $1))))
> 
>     (stmt
>      (ident "=" expr ($$ `(assn-stmt ,$1 ,$3)))
>      (expr ($$ `(expr-stmt ,$1)))
>      ( ($$ '(empty-stmt))))
> 
>     (expr
>      (expr "+" expr ($$ `(add ,$1 ,$3)))
>      (expr "-" expr ($$ `(sub ,$1 ,$3)))
>      (expr "*" expr ($$ `(mul ,$1 ,$3)))
>      (expr "/" expr ($$ `(div ,$1 ,$3)))
>      ('$fixed ($$ `(fixed ,$1)))
>      ('$float ($$ `(float ,$1)))
>      ("(" expr ")" ($$ $2)))
> 
>     (ident ('$ident ($$ `(ident ,$1))))
>     )))
> 
> (define calc-mach
>   (compact-machine
>    (hashify-machine
>      (make-lalr-machine calc-spec))))
> 
> (define calc-parse
>   (let ((gen-lexer (make-lexer-generator (assq-ref calc-mach 'mtab)
>                                          #:space-chars " \t"))
>         (parser (make-lalr-ia-parser calc-mach)))
>     (lambda* (#:key (debug #f)) (parser (gen-lexer) #:debug debug))))
> 
> 
> compiler.scm:
> (define-module (nyacc lang calc compiler)
>   #:export (calc-sxml->tree-il)
>   #:use-module (sxml match)
>   #:use-module (sxml fold)
>   ;;#:use-module (system base language)
>   #:use-module (language tree-il))
> 
> (define (fup tree)
>   (sxml-match tree
>     ((fixed ,fx) `(const ,(string->number fx)))
>     ((float ,fl) `(const ,(string->number fl)))
>     ((ident ,id) `(toplevel ,(string->symbol id)))
>     ((add ,lt ,rt) `(apply (toplevel +) ,lt ,rt))
>     ((sub ,lt ,rt) `(apply (toplevel -) ,lt ,rt))
>     ((mul ,lt ,rt) `(apply (toplevel *) ,lt ,rt))
>     ((div ,lt ,rt) `(apply (toplevel /) ,lt ,rt))
>     ((assn-stmt (toplevel ,lhs) ,rhs) `(define ,lhs ,rhs))
>     ((empty-stmt) '(begin))
>     ((stmt-list ,items ...) `(begin ,items ...))
>     (,otherwise tree)))
> 
> (define (calc-sxml->tree-il exp env opts)
>   (let* ((tree (foldt fup identity exp))
>          (code (parse-tree-il tree)))
>     (values code env env)))
> 





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q on (language <lang> spec)
  2015-10-19  3:53   ` Nala Ginrut
@ 2015-10-23  0:20     ` Matt Wette
  2015-10-23 13:10       ` Matt Wette
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Wette @ 2015-10-23  0:20 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]


> On Oct 18, 2015, at 8:53 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> And I'm grad that you write a new lexer generator (before it I only know
> silex), it's great! Would you like to make the generated tokens
> compatible with scm-lalr? If so, people may rewrite their lexer module
> with your lexer generator, and no need to rewrite the parser. I saw the
> token name is string rather than symbol, so I guess it's not compatible
> with scm-lalr.

Actually, the lexer-generator uses convention of internally turning certain lexemes, like strings, into symbols like ‘$string, or integers into ‘$fixed.  The argument to the lexer-generator is a “match-table” which says how to map the read items quoted items are identifiers (e.g., “while”) or character sequences (e.g., “+=“) to something the parser wants to see.  For example, if you use the symbol WHILE to denote the source text “while” then you would have an entry (“while” . ‘WHILE) in the match table.   So I think the lexer-generator should be adaptable to other parsers.

As as side note, the nyacc parser generator can be “hashified” which means the lexer should return integers.   In that case the match table has entries that look like (“while” . 45).

Matt


[-- Attachment #2: Type: text/html, Size: 6059 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q on (language <lang> spec)
  2015-10-23  0:20     ` Matt Wette
@ 2015-10-23 13:10       ` Matt Wette
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Wette @ 2015-10-23 13:10 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 4010 bytes --]


> On Oct 22, 2015, at 5:20 PM, Matt Wette <matthew.wette@verizon.net> wrote:
> 
> 
>> On Oct 18, 2015, at 8:53 PM, Nala Ginrut <nalaginrut@gmail.com <mailto:nalaginrut@gmail.com>> wrote:
>> And I'm grad that you write a new lexer generator (before it I only know
>> silex), it's great! Would you like to make the generated tokens
>> compatible with scm-lalr? If so, people may rewrite their lexer module
>> with your lexer generator, and no need to rewrite the parser. I saw the
>> token name is string rather than symbol, so I guess it's not compatible
>> with scm-lalr.
> 
> Actually, the lexer-generator uses convention of internally turning certain lexemes, like strings, into symbols like ‘$string, or integers into ‘$fixed.  The argument to the lexer-generator is a “match-table” which says how to map the read items quoted items are identifiers (e.g., “while”) or character sequences (e.g., “+=“) to something the parser wants to see.  For example, if you use the symbol WHILE to denote the source text “while” then you would have an entry (“while” . ‘WHILE) in the match table.   So I think the lexer-generator should be adaptable to other parsers.


I didn’t describe this very well.   I will try again.   The code actually provides a lexical analyzer (aka lexer) generator-generator.   To make a lexer you call make-lexer-generator with a match-table as argument:

 (define gen-lexer (make-lexer-generator match-table))

Then when you pass a generated lexer each time you call the parser: 

  (parse (gen-lexer))

The reason is that the lexer keeps state information (e.g., the beginning-of-line condition).  Now the match table argument indicates how the user wants lexemes, read from the input, to be reported to the parser.   If you want “while” in the input to be reported as ‘WHILE to the parser, then the match table would include an entry ‘(“while” . WHILE).   The generator uses special symbols to represent quoted strings, numbers and comments.  If you want quoted strings returned with the symbol ‘STRING, then the match table would include an entry ‘($string . STRING).  

In many cases I have nyacc "hashify” my parser so that it uses integers instead of symbols.  Here is the match table generated for the hashified matlab parser:

(define mtab
  '(($lone-comm . 1) ($string . 2) ($float . 3) ($fixed . 4) ($ident . 5) (
    ";" . 6) (".'" . 7) ("'" . 8) ("~" . 9) (".^" . 10) (".\\" . 11) ("./" .
    12) (".*" . 13) ("^" . 14) ("\\" . 15) ("/" . 16) ("*" . 17) ("-" . 18) (
    "+" . 19) (">=" . 20) ("<=" . 21) (">" . 22) ("<" . 23) ("~=" . 24) ("=="
    . 25) ("&" . 26) ("|" . 27) (":" . 28) ("case" . 29) ("elseif" . 30) (
    "clear" . 31) ("global" . 32) ("return" . 33) ("otherwise" . 34) ("switch"
    . 35) ("else" . 36) ("if" . 37) ("while" . 38) ("for" . 39) ("," . 40) (
    ")" . 41) ("(" . 42) ("=" . 43) ("]" . 44) ("[" . 45) ("function" . 46) (
    #\newline . 47) ("end" . 48) ($end . 49)))

and here is the match table generated for the non-hashified match table for the same language:

(define mtab
  '(($lone-comm . $lone-comm) ($string . $string) ($float . $float) ($fixed 
    . $fixed) ($ident . $ident) (";" . #{$:;}#) (".'" . $:.') ("'" . $:') ("~"
    . $:~) (".^" . $:.^) (".\\" . $:.\) ("./" . $:./) (".*" . $:.*) ("^" . 
    $:^) ("\\" . $:\) ("/" . $:/) ("*" . $:*) ("-" . $:-) ("+" . $:+) (">=" . 
    $:>=) ("<=" . $:<=) (">" . $:>) ("<" . $:<) ("~=" . $:~=) ("==" . $:==) (
    "&" . $:&) ("|" . $:|) (":" . $::) ("case" . $:case) ("elseif" . $:elseif)
    ("clear" . $:clear) ("global" . $:global) ("return" . $:return) (
    "otherwise" . $:otherwise) ("switch" . $:switch) ("else" . $:else) ("if" 
    . $:if) ("while" . $:while) ("for" . $:for) ("," . $:,) (")" . #{$:\x29;}#
    ) ("(" . #{$:\x28;}#) ("=" . $:=) ("]" . #{$:\x5d;}#) ("[" . #{$:\x5b;}#) 
    ("function" . $:function) (#\newline . #\newline) ("end" . $:end) ($end . 
    $end)))


[-- Attachment #2: Type: text/html, Size: 11458 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-23 13:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-17 19:02 Q on (language <lang> spec) Matt Wette
2015-10-18 20:00 ` Matt Wette
2015-10-19  3:53   ` Nala Ginrut
2015-10-23  0:20     ` Matt Wette
2015-10-23 13:10       ` Matt Wette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).