Start of Parsing
Table of Contents
1. Parser
1.1. How it works
Model
program -> [parser] -> Abstract Syntax tree (AST) -> [Interp] -> Result
transcript and Host Language langauge
EX: Python is the transcript langauge and the host language c
1+2 -> (plusC (NumC1) -> 3 (NumC2))
1.2. extended Backus–Naur form (EBNF)
Stuff in curly angle brackets are variables
<expr>::=<num>
| {* <expr> <expr>}
| {+ <expr> <expr>}
{* {+ 2 3} 7} -> the second type of expression
Actually, all of these are expressions defined earlier!
The expression can be split into a tree where the <num> is the leaf.
1.3. Now in Racket
NOTE ’()’ and ’{}’ are interchangable in racket
;;1. Data Definitions
(define-type ExprC (U NumC PlusC MultC))
(struct NumC ([n : Real]) #:transparent)
(struct PlusC ([left : ExprC] [right : ExprC]) #:transparent)
(struct MultC ([left : ExprC] [right : ExprC]) #:transparent)
;;2. Purpose Statement and header
;;parse the given program into an AST
(define (parse [prog : Sexp]) : ExprC
(match prog
;;<num>
[(? real? n) (NumC n)]
;; {+ <expr> <expr>}
[(list '+ l r) (Plusc (parse l) (parse r))]
;; {* <expr> <expr>}
[(list '* l r) (MultC (parse l) (parse r))]))
;;3. tests
(check-equal? (parse '{* {+ 2 3} 7})
(MultC (PlusC (NumC 2) (NumC 3))
(NumC 7)))
(check-equal? (parse '2) (NumC 2))
(check-equal? (parse '{+ 3 4})
(PlusC (NumC 3) (NumC 4)))
Why is the program type a String?
- Parsing Traditionally : lexer/tokenizer
- { * { + 2 3 } 7} <BRACE, ASTERISK, LBRACE, PLUS, Num(2), Num(3), …
- Tokenizer: chuncks string into individual tokens.
- Parser
- WE DONT WORRY ABOUT THIS LET RACKET HANDLE THE TOKENIZING
So we use Sexp or Sexpression instead!