From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Matt Wette Newsgroups: gmane.lisp.guile.user Subject: Re: Q on (language spec) Date: Fri, 23 Oct 2015 06:10:51 -0700 Message-ID: <7F1E38C4-E065-4858-B190-FB8431DB223A@verizon.net> References: <02BEFF46-4DA9-4222-989C-6AAA4FABDCE0@verizon.net> <1445226782.11524.38.camel@Renee-desktop.suse> <040AE36A-A8DC-4E37-BB91-EF83C1ADBBDB@verizon.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_17225708-C358-4FC1-B2C6-5A1233C8203E" X-Trace: ger.gmane.org 1445605903 32360 80.91.229.3 (23 Oct 2015 13:11:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 23 Oct 2015 13:11:43 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Fri Oct 23 15:11:28 2015 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zpc7w-0000UL-Du for guile-user@m.gmane.org; Fri, 23 Oct 2015 15:11:24 +0200 Original-Received: from localhost ([::1]:38426 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zpc7v-0004gu-QM for guile-user@m.gmane.org; Fri, 23 Oct 2015 09:11:23 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zpc7k-0004gj-ML for guile-user@gnu.org; Fri, 23 Oct 2015 09:11:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zpc7g-0001Sh-Ga for guile-user@gnu.org; Fri, 23 Oct 2015 09:11:12 -0400 Original-Received: from vms173023pub.verizon.net ([206.46.173.23]:59044) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zpc7g-0001SR-9O for guile-user@gnu.org; Fri, 23 Oct 2015 09:11:08 -0400 Original-Received: from [192.168.2.127] ([72.87.204.128]) by vms173023.mailsrvcs.net (Oracle Communications Messaging Server 7.0.5.32.0 64bit (built Jul 16 2014)) with ESMTPA id <0NWO00C3MCM3DX30@vms173023.mailsrvcs.net> for guile-user@gnu.org; Fri, 23 Oct 2015 08:10:52 -0500 (CDT) X-CMAE-Score: 0 X-CMAE-Analysis: v=2.1 cv=WcjxEBVX c=1 sm=1 tr=0 a=Jf1g6iwM2K3MHzQE8uPn/Q==:117 a=o1OHuDzbAAAA:8 a=oR5dmqMzAAAA:8 a=5lJygRwiOn0A:10 a=pGLkceISAAAA:8 a=LKTWhDaKmUIbq7hoQS4A:9 a=QEXdDO2ut3YA:10 a=TFdZ20oed0ZSgo2rSYEA:9 a=HVk8bautfbEbt28R:21 a=_W_S_7VecoQA:10 In-reply-to: <040AE36A-A8DC-4E37-BB91-EF83C1ADBBDB@verizon.net> X-Mailer: Apple Mail (2.2104) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.46.173.23 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:12099 Archived-At: --Apple-Mail=_17225708-C358-4FC1-B2C6-5A1233C8203E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Oct 22, 2015, at 5:20 PM, Matt Wette = wrote: >=20 >=20 >> On Oct 18, 2015, at 8:53 PM, Nala Ginrut > wrote: >> And I'm grad that you write a new lexer generator (before it I only = know >> silex), it's great! Would you like to make the generated tokens >> compatible with scm-lalr? If so, people may rewrite their lexer = module >> with your lexer generator, and no need to rewrite the parser. I saw = the >> token name is string rather than symbol, so I guess it's not = compatible >> with scm-lalr. >=20 > Actually, the lexer-generator uses convention of internally turning = certain lexemes, like strings, into symbols like =E2=80=98$string, or = integers into =E2=80=98$fixed. The argument to the lexer-generator is a = =E2=80=9Cmatch-table=E2=80=9D which says how to map the read items = quoted items are identifiers (e.g., =E2=80=9Cwhile=E2=80=9D) or = character sequences (e.g., =E2=80=9C+=3D=E2=80=9C) to something the = parser wants to see. For example, if you use the symbol WHILE to denote = the source text =E2=80=9Cwhile=E2=80=9D then you would have an entry = (=E2=80=9Cwhile=E2=80=9D . =E2=80=98WHILE) in the match table. So I = think the lexer-generator should be adaptable to other parsers. I didn=E2=80=99t describe this very well. I will try again. The code = actually provides a lexical analyzer (aka lexer) generator-generator. = To make a lexer you call make-lexer-generator with a match-table as = argument: (define gen-lexer (make-lexer-generator match-table)) Then when you pass a generated lexer each time you call the parser:=20 (parse (gen-lexer)) The reason is that the lexer keeps state information (e.g., the = beginning-of-line condition). Now the match table argument indicates = how the user wants lexemes, read from the input, to be reported to the = parser. If you want =E2=80=9Cwhile=E2=80=9D in the input to be = reported as =E2=80=98WHILE to the parser, then the match table would = include an entry =E2=80=98(=E2=80=9Cwhile=E2=80=9D . WHILE). The = generator uses special symbols to represent quoted strings, numbers and = comments. If you want quoted strings returned with the symbol = =E2=80=98STRING, then the match table would include an entry =E2=80=98($st= ring . STRING). =20 In many cases I have nyacc "hashify=E2=80=9D my parser so that it uses = integers instead of symbols. Here is the match table generated for the = hashified matlab parser: (define mtab '(($lone-comm . 1) ($string . 2) ($float . 3) ($fixed . 4) ($ident . = 5) ( ";" . 6) (".'" . 7) ("'" . 8) ("~" . 9) (".^" . 10) (".\\" . 11) = ("./" . 12) (".*" . 13) ("^" . 14) ("\\" . 15) ("/" . 16) ("*" . 17) ("-" . = 18) ( "+" . 19) (">=3D" . 20) ("<=3D" . 21) (">" . 22) ("<" . 23) ("~=3D" = . 24) ("=3D=3D" . 25) ("&" . 26) ("|" . 27) (":" . 28) ("case" . 29) ("elseif" . 30) = ( "clear" . 31) ("global" . 32) ("return" . 33) ("otherwise" . 34) = ("switch" . 35) ("else" . 36) ("if" . 37) ("while" . 38) ("for" . 39) ("," . = 40) ( ")" . 41) ("(" . 42) ("=3D" . 43) ("]" . 44) ("[" . 45) ("function" = . 46) ( #\newline . 47) ("end" . 48) ($end . 49))) and here is the match table generated for the non-hashified match table = for the same language: (define mtab '(($lone-comm . $lone-comm) ($string . $string) ($float . $float) = ($fixed=20 . $fixed) ($ident . $ident) (";" . #{$:;}#) (".'" . $:.') ("'" . = $:') ("~" . $:~) (".^" . $:.^) (".\\" . $:.\) ("./" . $:./) (".*" . $:.*) ("^" = .=20 $:^) ("\\" . $:\) ("/" . $:/) ("*" . $:*) ("-" . $:-) ("+" . $:+) = (">=3D" .=20 $:>=3D) ("<=3D" . $:<=3D) (">" . $:>) ("<" . $:<) ("~=3D" . $:~=3D) = ("=3D=3D" . $:=3D=3D) ( "&" . $:&) ("|" . $:|) (":" . $::) ("case" . $:case) ("elseif" . = $:elseif) ("clear" . $:clear) ("global" . $:global) ("return" . $:return) ( "otherwise" . $:otherwise) ("switch" . $:switch) ("else" . $:else) = ("if"=20 . $:if) ("while" . $:while) ("for" . $:for) ("," . $:,) (")" . = #{$:\x29;}# ) ("(" . #{$:\x28;}#) ("=3D" . $:=3D) ("]" . #{$:\x5d;}#) ("[" . = #{$:\x5b;}#)=20 ("function" . $:function) (#\newline . #\newline) ("end" . $:end) = ($end .=20 $end))) --Apple-Mail=_17225708-C358-4FC1-B2C6-5A1233C8203E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Oct 22, 2015, at 5:20 PM, Matt Wette <matthew.wette@verizon.net> wrote:


On Oct 18, 2015, at 8:53 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
And I'm grad that you write a new lexer = generator (before it I only know
silex), it's great! Would you like to make the = generated tokens
compatible with scm-lalr? = If so, people may rewrite their lexer module
with your lexer generator, and no need to = rewrite the parser. I saw the
token name is string = rather than symbol, so I guess it's not compatible
with scm-lalr.

Actually, the lexer-generator uses convention = of internally turning certain lexemes, like strings, into symbols like = =E2=80=98$string, or integers into =E2=80=98$fixed.  The argument = to the lexer-generator is a =E2=80=9Cmatch-table=E2=80=9D which says how = to map the read items quoted items are identifiers (e.g., =E2=80=9Cwhile=E2= =80=9D) or character sequences (e.g., =E2=80=9C+=3D=E2=80=9C) to = something the parser wants to see.  For example, if you use the = symbol WHILE to denote the source text =E2=80=9Cwhile=E2=80=9D then you = would have an entry (=E2=80=9Cwhile=E2=80=9D . =E2=80=98WHILE) in the = match table.   So I think the lexer-generator should be adaptable = to other parsers.

I didn=E2=80=99t describe this very well.   I = will try again.   The code actually provides a lexical analyzer = (aka lexer) generator-generator.   To make a lexer you call = make-lexer-generator with a match-table as argument:

 (define gen-lexer = (make-lexer-generator match-table))


Then when you pass a generated lexer each time you = call the parser: 

  (parse = (gen-lexer))

The reason is that the = lexer keeps state information (e.g., the beginning-of-line condition). =  Now the match table argument indicates how the user wants lexemes, = read from the input, to be reported to the parser.   If you want = =E2=80=9Cwhile=E2=80=9D in the input to be reported as =E2=80=98WHILE to = the parser, then the match table would include an entry =E2=80=98(=E2=80=9C= while=E2=80=9D . WHILE).   The generator uses special symbols to = represent quoted strings, numbers and comments.  If you want quoted = strings returned with the symbol =E2=80=98STRING, then the match table = would include an entry =E2=80=98($string . STRING).  

In many cases I have nyacc "hashify=E2=80=9D my = parser so that it uses integers instead of symbols.  Here is the = match table generated for the hashified matlab parser:

(define mtab

  '(($lone-comm . 1) ($string . 2) ($float . 3) ($fixed = . 4) ($ident . 5) (

    ";" . 6) (".'" . 7) ("'" . 8) ("~" . 9) (".^" . = 10) (".\\" . 11) ("./" .

    12) (".*" . 13) ("^" . 14) ("\\" . 15) ("/" . = 16) ("*" . 17) ("-" . 18) (

    "+" . 19) (">=3D" . 20) ("<=3D" . 21) = (">" . 22) ("<" . 23) ("~=3D" . 24) ("=3D=3D"

    . 25) ("&" . 26) ("|" . 27) (":" . 28) = ("case" . 29) ("elseif" . 30) (

    "clear" . 31) ("global" . 32) ("return" . 33) = ("otherwise" . 34) ("switch"

    . 35) ("else" . 36) ("if" . 37) ("while" . 38) = ("for" . 39) ("," . 40) (

    ")" . 41) ("(" . 42) ("=3D" . 43) ("]" . 44) = ("[" . 45) ("function" . 46) (

    #\newline . 47) ("end" . 48) ($end . = 49)))


and here is the = match table generated for the non-hashified match table for the same = language:

(define mtab

  '(($lone-comm . $lone-comm) ($string . $string) = ($float . $float) ($fixed 

    . $fixed) ($ident . $ident) (";" . #{$:;}#) = (".'" . $:.') ("'" . $:') ("~"

    . $:~) (".^" . $:.^) (".\\" . $:.\) ("./" . = $:./) (".*" . $:.*) ("^" . 

    $:^) ("\\" . $:\) ("/" . $:/) ("*" . $:*) ("-" = . $:-) ("+" . $:+) (">=3D" . 

    $:>=3D) ("<=3D" . $:<=3D) (">" . = $:>) ("<" . $:<) ("~=3D" . $:~=3D) ("=3D=3D" . $:=3D=3D) (

    "&" . $:&) ("|" . $:|) (":" . $::) = ("case" . $:case) ("elseif" . $:elseif)

    ("clear" . $:clear) ("global" . $:global) = ("return" . $:return) (

    "otherwise" . $:otherwise) ("switch" . = $:switch) ("else" . $:else) ("if" 

    . $:if) ("while" . $:while) ("for" . $:for) = ("," . $:,) (")" . #{$:\x29;}#

    ) ("(" . #{$:\x28;}#) ("=3D" . $:=3D) ("]" . = #{$:\x5d;}#) ("[" . #{$:\x5b;}#) 

    ("function" . $:function) (#\newline . = #\newline) ("end" . $:end) ($end . 

    $end)))


= --Apple-Mail=_17225708-C358-4FC1-B2C6-5A1233C8203E--